mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Lone Mersenne Hunters (https://www.mersenneforum.org/forumdisplay.php?f=12)
-   -   How to use prime95 for stage 1 & GMP-ECM for stage 2 (https://www.mersenneforum.org/showthread.php?t=20092)

Prime95 2015-03-02 00:41

How to use prime95 for stage 1 & GMP-ECM for stage 2
 
[QUOTE=lycorn;396764]Well, finding factors this large is hard. It takes a lot of curves to get one.
I´m currently putting M11423 through the same process. There are still 4000 curves to run @B1=11000000. If nothing comes up. I´ll move to B1=44000000. But that yes, will be really tough...[/QUOTE]

Are you using GMP-ECM for stage 2?? Can you shed any light on what size exponents benefit from GMP-ECM stage 2?

lycorn 2015-03-02 00:59

No. I´m using Prime95 for both stages.
Is there any GMP-ECM version for windows? And if yes, can you point me to the binaries?
I´m assuming GMP-ECM brings some advantage over Prime95 for stage 2, otherwise you wouldn´t have posed the question.

Prime95 2015-03-02 04:34

I don't know about binaries -- I expect there are some available.

Yes, GMP-ECM, given enough memory, is vastly superior for small exponents. It runs a much deeper stage 2, which lets you run far fewer curves. I'm not sure about exponents near 20000, if you decide to investigate I'd be curious as to what you find out.

You'll need to look up the GmpEcmHook=1 option in undoc.txt.

VBCurtis 2015-03-02 22:25

[QUOTE=lycorn;396775]No. I´m using Prime95 for both stages.
Is there any GMP-ECM version for windows? And if yes, can you point me to the binaries?
I´m assuming GMP-ECM brings some advantage over Prime95 for stage 2, otherwise you wouldn´t have posed the question.[/QUOTE]

Here's the repository:
[url]http://gilchrist.ca/jeff/factoring/index.html[/url]

The maintainer is an infrequent poster here.

The fastest use is Prime95 for stage1, save residues, use them as input for GMP-ECM stage 2. George is asking about memory requirements for exponents under 20k; I have a 16Gb system so I may be able to answer this if you cannot. Use the "-v" flag when experimenting with B2 bounds to see how much memory is needed and how many curves make a t45 with each setting of B2. The higher your B2, the higher the chance to find a factor per curve, thus the required curve count for each level decreases.

lycorn 2015-03-03 01:43

[QUOTE=VBCurtis;396830]Here's the repository:
[url]http://gilchrist.ca/jeff/factoring/index.html[/url]

The maintainer is an infrequent poster here.

The fastest use is Prime95 for stage1, save residues, use them as input for GMP-ECM stage 2. George is asking about memory requirements for exponents under 20k; I have a 16Gb system so I may be able to answer this if you cannot. Use the "-v" flag when experimenting with B2 bounds to see how much memory is needed and how many curves make a t45 with each setting of B2. The higher your B2, the higher the chance to find a factor per curve, thus the required curve count for each level decreases.[/QUOTE]

Thanks a lot. I just started trying it. Some questions will most probably arise...
For now, just one: the advantage of running Stage 1 on Prime95 and Stage 2 on GMP-ECM holds only for small exponents, as I understood it. But, how small?

Prime95 2015-03-03 04:22

[QUOTE=lycorn;396851]For now, just one: the advantage of running Stage 1 on Prime95 and Stage 2 on GMP-ECM holds only for small exponents, as I understood it. But, how small?[/QUOTE]

That is what we'd like to find out. It may depend on how much memory GMP-ECM is allowed to use.

philmoore 2015-03-03 04:37

I had good results running stage 2 with GMP-ECM on M8191 using 3 MB of allocated memory, so I am pretty sure that you should still find it useful on M11243. The one problem I had was that the only way I could get GMP-ECM to log its results was to write the entire log file to memory, which gave me one huge file. I would have liked the capability to just log results if a factor was found. If anyone has a suggestion of how to do that, I would love to hear about it.

axn 2015-03-03 06:04

[QUOTE=philmoore;396872]I had good results running stage 2 with GMP-ECM on M8191 using 3 MB of allocated memory[/QUOTE]

MB or GB?

lycorn 2015-03-03 15:11

So I have given GMP-ECM a try, and as expected some questions popped up. Don´t want to be a PITA, but can´t help bring some to the thread. May some knowledgeable (and patient) member shed any light on my confused mind... :smile:

I started by running Prime95 ECM, using B1=B2=11000000, on M11423. I inserted the GmpEcmHook=1 line in prime.txt.
All fine, Prime95 ran Stage 1 only, and wrote a residue to the results file.

I copied said residue to a file named work.txt, and ran GMP-ECM using the following command:

ecm -v -resume work.txt 11000000

The program started, by printing some lines of info, including the value I had given for B1 and a calculated value B2=30114149530.
It also listed the expected number of curves for finding factors of different sizes.
It also mentioned that it was resuming a residue saved with Prime95, which I took as a sign things were going as expected.
After a while, the message [I]Step 1 took 2510343 ms[/I] popped up. My first question: is this Step 1 the Stage 1? If yes, what is the point in running it in Prime95 only to have it re-run by GMP-ECM? I was under the impression that GMP-ECM would only run the Stage 2 in this "resume" situation.
It then gave a memory usage estimation of 2749M, which is fine, as I have over 6G available. So I think at this point it started running Stage 2. Severall lines regarding initializations and computations of different sorts followed, then the message [I]Step 2 took 652360 ms[/I] popped up. Below it, the program displayed a list of expected times to find a factor of n digits.
Last line was: [I]Save file line has no equal sign after: UID: lycorn/ast[/I].

And that was it. I think in general terms it went well, but in order to make some sense of it and using it regularly, I need to understand a couple of things:
First, I was really confused by the fact that the Stage 1 was apparently run by GMP-ECM - in fact it took way longer than Stage 2, and also much longer than Stage 1 run by Prime95. As it is resuming a computation that has finished Stage 1, I supposed it would only run Stage 2. How can we get it to do it ?

Second, where have the results gone, so they can be reported to Primenet? I suspect nowhere, which means the program doesn´t automatically create something similar to the "results" file of Prime95. Does it have to be created at runtime, using the -save switch?

Third, If my interpretation is correct:

- I ran just one curve, which took 2510343 + 652360 ms ~ 52.71 minutes
- The prescribed number of curves for finding a 45 digit factor (in line with a B1=11000000) is 4630 (this number was taken from the list given at the end of Stage 1). At the end of Stage 2, the listed expected time to find a 45 digit factor was 169.49 days, which matches perfectly the figures for number of curves and duration of each one.
On my computer, Prime95 runs a curve on M11423, @B1=11000000 in 16 minutes, and the prescribed number of curves is 9700, which gives 107,(7) days. This shows that using Prime95 only would be much faster.
There´s certainly something I´m missing here - I guess it has to do with running the Stage 1 on GMP-ECM [B]and[/B] on Prime95, Stage 1 takes so long that it more than makes up for the gain in running less curves at a higher B2, on GMP-ECM.


Any clarification would be much appreciated.
Thanks in advance

axn 2015-03-03 16:22

I think you need to give both B1 and B2, like so:

[CODE] ecm -v -resume work.txt 1 11e6-30e9[/CODE]

i.e. B1=1, and B2=11e6 to 30e9

VictordeHolland 2015-03-03 16:40

[QUOTE=lycorn;396900]
Last line was: [I]Save file line has no equal sign after: UID: lycorn/ast[/I].
[/QUOTE]
I believe you can't use userid/computerid when doing stage1 Prime95 and stage2 on GMP-ECM.
It should say something like this:
[code]
Resuming ECM residue saved with Prime95
Input number is x (x digits)
Using B1=1, B2=x polynomial Dickson(x), sigma=x
Step 1 took x ms
Step 2 took x ms[/code]Step 1 should take only a few ms.

philmoore 2015-03-03 17:16

[QUOTE=axn;396878]MB or GB?[/QUOTE]

GB.

YuL 2015-03-03 17:18

[QUOTE=axn;396903]I think you need to give both B1 and B2, like so:

[CODE] ecm -v -resume work.txt 1 11e6-30e9[/CODE]i.e. B1=1, and B2=11e6 to 30e9[/QUOTE]


You could do this:

[CODE]# use default B2
ecm -v -resume results.txt 11e6-11e6
# use B2=11e8
ecm -v -resume results.txt 11e6-11e6 11e8
[/CODE]At least that's what I do when I run stage 1 with Prime95 and stage 2 with gmp-ecm
because I prefer to have B1 value in the output:

[CODE]GMP-ECM 6.4.4 [configured with GMP 6.0.0, --enable-asm-redc] [ECM]
Resuming ECM residue saved with Prime95
Input number is 0x16F9A1369338819B52EBC62EFD7B6CA .... 33DB13E1B5331 (148 digits)
[Thu Feb 26 14:29:35 2015]
Using MODMULN [mulredc:0, sqrredc:0]
Using B1=15000000-15000000, B2=46847553490, polynomial Dickson(12), sigma=690944836466098
dF=32768, k=4, d=324870, d2=11, i0=36
Expected number of curves to find a factor of n digits:
35 40 45 50 55 60 65 70 75 80
102 543 3391 23969 191190 1675913 1.6e+007 1.7e+008 1.9e+009 2.2e+010
Step 1 took 16ms
Using 19 small primes for NTT
Estimated memory usage: 104M
Initializing tables of differences for F took 62ms
...

[/CODE]

VBCurtis 2015-03-03 19:09

Lycorn-
You are correct, in that some mistake in command/flag/etc caused GMP-ECM to re-run stage 1, wasting time. You can calculate the time for the prime95/GMP combo by taking the stage 1 time from Prime95 and the stage2 time from GMP-ECM as the time per curve.

You discovered just how much faster Prime95 is at Stage 1!

Once you get the commands figured out, try using B1 = 21M and B2 = GMP-ECM default. See what -v output predicts for the time to complete t45. I've found this choice of B1 more efficient for smaller numbers, am curious for ones this large. Note this will cause stage 2 to use twice the memory, just fitting in your 6GB envelope. Also, note default t50 B1 of 43/44M will use the same memory as 21M that I suggest- so, we can conclude that GMP-ECM is useful for this exponent size up to t50 for 8GB machines, and up to t55 for 16GB machines. Not bad!

I think memory use scales roughly with exponent size, so t45 B1 = 11M tests should work on an 8GB machine for exponent sizes up to 23,000 or so, and 16GB machines up to nearly 50,000.

VBCurtis 2015-03-03 19:18

Also, the -k option can arrange stage 2 to use half the memory at a small loss in efficiency. Part of your -v output lists a k-value, which is the number of chunks stage 2 is split into (2 minimum, I've used k over 20 for really large B1 runs). B1 = 11M uses a default k of 3. If you issue the flag -k 12 in your command, stage 2 will use half the memory, but take 12 runs through most of stage 2 processes.
The stage 2 "blocks" increase by factors of 4, so setting -k 6 won't split the difference; the program will adjust B2 to be efficient given your request for 6 blocks.

Your run used a B2 of 30G, and k = 3. If you used k = 4, B2 would be set to 40G. If you use k = 2, I don't know if B2 would reduce to 20G and keep the same block & memory size, or jump to the next block size of 40G for a B2 of 80G.

There are lots of options available! For long runs, minimizing the expected time to complete the level of interest is wise. That said, for two settings that have similar expected times to complete, the settings with higher B1 or B2 will have a higher chance to get lucky with a bigger factor. Different chip architectures and even different machine setups (memory bandwidth, OS, whatever) can respond differently to various settings, so a little experimentation can save you 10% of runtime if you're lucky.

lorgix 2015-03-03 20:39

With some simplifications and assumptions, here's how to optimize it:

Set the highest -maxmem you are comfortable with. Time stage2 with a reasonable B2. Divide the time taken by ~0.7 (0.695?) to get the time you should spend in stage1. Find the B1 that brings you there. Time usually scales quite linearly with B1. This should be close to a optimal B2/B1 ratio for the given circumstances. If that B1-B2-combo brings the needed number of curves too high or too low for your preference, just change B2 and start over from the third sentence of this paragraph.

bloodIce 2015-03-03 21:13

GMP-ECM could be a program of choice for small exponents. However, is it possible to report the ECM-curves to GIMPS? I am not aware how to do it and to me it seems loss of time to do GMP-ECM, since another person needs to do it for GIMPS too. It is a private effort.

lycorn 2015-03-03 22:38

That got me wondering as well.
Thanks to all those who answered my previous post, I have just managed to run only the Stage 2, and the time for the combined run Prime95-Stage 1+ GMP-ECM-Stage 2 is ~23 minutes. More or less the same as for Prime95 only, but with the advantage that only 4630 curves are necessary, instead of 9700. So yes, it would be nice to use GMP-ECM for these range of exponents and factor sizes. The thing is, how do I report the results to Primenet?
The file obtained by using the -save switch is huge, and contains a residue, incomprehensible to the server. And this was for just one curve. If one runs 500 or more curves on a given exponent, the size of the save files containing Stage 1 and 2 residues will be ridiculously large. Is it supposed to be that way?
Connection to the server for reporting is something I deem essential.

VictordeHolland 2015-03-03 23:20

[QUOTE=lycorn;396943]The thing is, how do I report the results to Primenet?[/QUOTE]
In the past you could mail your GMP-ECM efforts to George and he would manually add curves to the database, so no work is duplicated. But keep in mind he is a busy man.\

edit:
Could a mod maybe move the posts about Prime95 stage 1 and GMP-ECM stage 2 to a new thread?

lorgix 2015-03-03 23:25

[QUOTE=lycorn;396943]That got me wondering as well.
Thanks to all those who answered my previous post, I have just managed to run only the Stage 2, and the time for the combined run Prime95-Stage 1+ GMP-ECM-Stage 2 is ~23 minutes. More or less the same as for Prime95 only, but with the advantage that only 4630 curves are necessary, instead of 9700. So yes, it would be nice to use GMP-ECM for these range of exponents and factor sizes. The thing is, how do I report the results to Primenet?
The file obtained by using the -save switch is huge, and contains a residue, incomprehensible to the server. And this was for just one curve. If one runs 500 or more curves on a given exponent, the size of the save files containing Stage 1 and 2 residues will be ridiculously large. Is it supposed to be that way?
Connection to the server for reporting is something I deem essential.[/QUOTE]

Did you try optimizing it like I suggested in #930?

And yes, the output from Prime95 gets large with large numbers. It prints the entire number in hexadecimal.

lycorn 2015-03-03 23:37

[QUOTE=lorgix;396945]Did you try optimizing it like I suggested in #930?[/QUOTE]

As I understood it, your post seemed to refer to work done entirely (Stage 1 + 2) with GMP-ECM. I am doing Stage 1 with Prime95, so the choice of B1 is as prescribed in the Primenet pages. I will probably fiddle with B2, and then I´ll report back. Thanks for your posts.

As for logging and reporting results, it´s a pain we can´t do it the Prime95 way. That makes me doubt I will ever use GMP-ECM on a regular basis. Let´s see.

lorgix 2015-03-03 23:47

[QUOTE=lycorn;396948]As I understood it, your post seemed to refer to work done entirely (Stage 1 + 2) with GMP-ECM. I am doing Stage 1 with Prime95, so the choice of B1 is as prescribed in the Primenet pages. I will probably fiddle with B2, and then I´ll report back. Thanks for your posts.

As for logging and reporting results, it´s a pain we can´t do it the Prime95 way. That makes me doubt I will ever use GMP-ECM on a regular basis. Let´s see.[/QUOTE]
Regardless of Prime95/GMP-ECM or specific B1/B2, the relationship (in time requirement) should be B1:B2::1:0.7.
So if Prime95 takes 10 minutes to complete one stage1, you should have GMP-ECM use a B2 such that stage2 takes 7 minutes.

Yes, it would be very nice if this speedup could be incorporated into Prime95.

Prime95 2015-03-04 01:33

[QUOTE=bloodIce;396936]GMP-ECM could be a program of choice for small exponents. However, is it possible to report the ECM-curves to GIMPS?[/QUOTE]

Ask James Heinrich if he added a way to do this with the manual results form. If this becomes commonplace, James and I ought to be able to come up with some way to turn in results via the manual results page.

I can do it manually if you send me an email. I need your primenet userid, #curves run, exponent and bounds. Obviously, I don't want to do this very often, so send me a large number of curves -- not a few at a time.


As a bonus to people that use GMP-ECM for stage 2, the server calculates CPU credit by assuming prime95 performed the huge stage 2 bound.

lycorn 2015-03-04 08:55

[QUOTE=Prime95;396965]

I can do it manually if you send me an email.

[/QUOTE]

I will probably do some more work with GMP-ECM,to get confortable with the program.
Will send you a reasonable number of curves, but tell me: will you accept the results "on trust" (much like we report no factor lines from mfaktc/o to the server), which means I would be writing a short message containing UID, exponent, B1, B2, number of curves, or you expect to receive some data actually produced by the program? In the latter case, which one?

VictordeHolland 2015-03-04 10:43

Maybe as an interim solution, trusted users could submit their GMP-ECM results in a mfaktc/CudaLucas style fashion?
Some suggestions:
[code]
Mxxxxx completed 1000 ECM curves, B1=11000000, B2=30114149530 [stage2 GMP-ECM 6.4.4 MPIR 2.6.0 win64]
Mxxxxx completed 1000 ECM curves, B1=11000000, B2=30114149530, D(12) [GMP-ECM_7.0_SVN2256_win64]
Mxxxxx completed 1000 ECM curves, B1=11000000, B2=30114149530 [step1 Prime95 28.5 step2 GMP-ECM 6.4.4]
[/code]This is of course sensitive to fraud, but so is Mfakto and CudaLucas. The next thing to do would be to add a parameter to GMP-ECM ("-gimps" for instance) that would print this line to screen/file.

Prime95 2015-03-04 14:31

[QUOTE=lycorn;396984]I will probably do some more work with GMP-ECM,to get confortable with the program.
Will send you a reasonable number of curves, but tell me: will you accept the results "on trust" (much like we report no factor lines from mfaktc/o to the server), which means I would be writing a short message containing UID, exponent, B1, B2, number of curves, or you expect to receive some data actually produced by the program? In the latter case, which one?[/QUOTE]

Yes, I accept results "on trust".

R.D. Silverman 2015-03-04 14:39

[QUOTE=lycorn;396775]No. I´m using Prime95 for both stages.
Is there any GMP-ECM version for windows? And if yes, can you point me to the binaries?
I´m assuming GMP-ECM brings some advantage over Prime95 for stage 2, otherwise you wouldn´t have posed the question.[/QUOTE]

If you would bother to READ how step1/step 2 actually work, you might understand
what is involved.

Detailed explanations about how convolution based implementations of Step 2
have been written. Peter Montgomery's thesis is a superb source. An earlier
source would be my joint Math. Comp. paper with Peter on an FFT extension
to P-1.

Issues such as resource (i.e. memory) requirements are discussed, as are complexity
comparisons with earlier (non-convolution) implementations.

lycorn 2015-03-06 23:27

1 Attachment(s)
[QUOTE=Prime95;396771]Are you using GMP-ECM for stage 2?? Can you shed any light on what size exponents benefit from GMP-ECM stage 2?[/QUOTE]

I have done a couple of tests to compare Prime95 alone to Prime95 + GMP-ECM for different exponent sizes.
Find attached an Excel spreadsheet with a summary of the results.
I used for each exponent the B1 value currently prescribed in the Primenet [I]Reports -> Detailed Reports -> ECM Progress[/I] page.
In the rightmost column there is, for each exponent, the expected time (in days) to find a factor using the combo Prime95 + GMP-ECM, and the percentage shown below is relative to using Prime95 alone.
For exponents larger than 40 K there is no point in using both programs, This is valid for the current level of the search (in terms of factor size), but in practical terms that is what really counts.

lorgix 2015-03-07 02:52

[QUOTE=lycorn;397198]I have done a couple of tests to compare Prime95 alone to Prime95 + GMP-ECM for different exponent sizes.
Find attached an Excel spreadsheet with a summary of the results.
I used for each exponent the B1 value currently prescribed in the Primenet [I]Reports -> Detailed Reports -> ECM Progress[/I] page.
In the rightmost column there is, for each exponent, the expected time (in days) to find a factor using the combo Prime95 + GMP-ECM, and the percentage shown below is relative to using Prime95 alone.
For exponents larger than 40 K there is no point in using both programs, This is valid for the current level of the search (in terms of factor size), but in practical terms that is what really counts.[/QUOTE]
You must optimize B2/B1 at each level. Like in the case of M40253, B2 should be something like a third of what you used. As it is now you are spending far too much time in stage2.
And to be fair you should increase B2 when using Prime95. Otherwise the comparison becomes unfair. You must test both with optimal parameters.

lycorn 2015-03-07 12:05

I don´t think this is a matter of "fairness". As Stage 2 performed with GMP-ECM is faster than with Prime95 and Stage 1 is faster with Prime95, I was trying to find, for the currents needs of the search, where to use Prime95 alone or a combination of the two in order to get tests done faster. I am in no way claiming that one program is "better" than the other. As for the values of B2 used, they are the defaults provided by the program, so I assume they are optimized. If I reduce the values, the number of curves would certainly be higher, and the smaller number of curves is, as far as I understood it, the main point in running GMP-ECM for Stage 2, nstead of Prime95.
There is one caveat, though: performing Stage 1 with P95 and Stage 2 with GMP-ECM is very cumbersome if you want to run, say, 100 or 200 curves on a single exponent. The switches -resume and -c are incompatible, which means the curves have to be run one by one, the relevant residue from P95 S1 manually fed to GMP-ECM, and the programs restarted. Unless I am missing some fundamental issue here, this overhead undermines to a large extent the advantage of running the combination of P95 and GMP-ECM.

axn 2015-03-07 14:41

[QUOTE=lycorn;397228]There is one caveat, though: performing Stage 1 with P95 and Stage 2 with GMP-ECM is very cumbersome if you want to run, say, 100 or 200 curves on a single exponent. The switches -resume and -c are incompatible, which means the curves have to be run one by one, the relevant residue from P95 S1 manually fed to GMP-ECM, and the programs restarted. Unless I am missing some fundamental issue here, this overhead undermines to a large extent the advantage of running the combination of P95 and GMP-ECM.[/QUOTE]

It is the easiest thing in the world to run a batch of curves like this.

[QUOTE=undoc.txt]Alexander Kruppa wrote some code that allows the output of ECM stage 1 to
be passed to Paul Zimmermann's more efficient GMP-ECM stage 2. This program
is usually faster in stage 1. You can activate this feature by entering
GmpEcmHook=1
in prime.txt. Then select ECM bound #2 between 1 and bound #1. Results.txt
will contain data that can be fed to GMP-ECM for stage 2.[/QUOTE]

Run however many curves you want in Prime95 with the above setting. It'll run only the stage1 and save the residues for all the curves.

Then:[QUOTE=gmpecmhelp] -resume file resume residues from file, reads from stdin if file is "-"[/QUOTE]
The resume option works on a file full of residues (plus the original number and sigma). It'll go thru them one by one.

lorgix 2015-03-07 14:56

[QUOTE=lycorn;397228]I don´t think this is a matter of "fairness". As Stage 2 performed with GMP-ECM is faster than with Prime95 and Stage 1 is faster with Prime95, I was trying to find, for the currents needs of the search, where to use Prime95 alone or a combination of the two in order to get tests done faster. I am in no way claiming that one program is "better" than the other. As for the values of B2 used, they are the defaults provided by the program, so I assume they are optimized. If I reduce the values, the number of curves would certainly be higher, and the smaller number of curves is, as far as I understood it, the main point in running GMP-ECM for Stage 2, nstead of Prime95.
There is one caveat, though: performing Stage 1 with P95 and Stage 2 with GMP-ECM is very cumbersome if you want to run, say, 100 or 200 curves on a single exponent. The switches -resume and -c are incompatible, which means the curves have to be run one by one, the relevant residue from P95 S1 manually fed to GMP-ECM, and the programs restarted. Unless I am missing some fundamental issue here, this overhead undermines to a large extent the advantage of running the combination of P95 and GMP-ECM.[/QUOTE]
I meant that you should run each with optimized parameters for the most interesting comparison. And no, they are not optimized.
Read post #16 in this thread for how to optimize it. The theoretical background is in RDSs paper.

So, to be fair, you should use each program the best way possible. That is what I meant. For the other part; see what axn wrote in #31.

lycorn 2015-03-07 16:26

Thanks very much you both for your answers.

[QUOTE=axn;397231]
Then:
The resume option works on a file full of residues (plus the original number and sigma). It'll go thru them one by one.[/QUOTE]

The fundamental thing I was missing was that GMP-ECM goes through the file packed with P95 residues without having to specify the number of curves...

@lorgix: As for the optimization, I´ll have a look at it, but yes, I got your point.

Madpoo 2015-04-22 18:15

As a somewhat "fly on the wall" observer so far, I'm just dipping my toes into the GMP-ECM thing.

I have to admit, it's not very user friendly, and I mean that with all due respect, nothing towards how the actual code itself works.

It's just that, let's say I have several servers with many spare GB's of RAM available, maybe I want to throw some work it's way.

If I have a system with 12 cores, it's very easy to have Prime95 manage all the work with a nice, tidy worktodo file, and I can set the affinity so each worker is using it's own core.

So what it sounds like is that I should use this nice ecosystem for stage 1, and then get into hack mode to pipe all that into GMP-ECM (albeit running on Windows...Linux may be much nicer). I had a heckuva time getting just one instance of "ecm.exe" to affine to a single core. I can do a "start /affinity=<hex mask> ecm.exe ..." from the command line, and that's about as close as I could get to truly automating the affinity. Otherwise I was launching it and then using task manager to set the affinity after the fact, manually. No thanks. Because some things you can't do when "START"ing with affinity, like pipe stdin to the process you're launching or redirect output. You would kind of need one of the 3rd party (or Microsoft) apps to change the affinity of a running process, and be able to script it.

Oh, and bear in mind, in my case I could be running 12-20 instances at once if I had the memory available.

Then there's the question of output from the program. If I use the start /affinity, it launches in a new console window and when the program finishes, it closes it. Unless I had the foresight to capture all of the output by piping it somewhere, if it did manage to find anything it'd be lost. And while I found the -inp option to read the work in from a file that contains something like "2^1277-1" I had no luck finding a switch that would output any text, just the large save file. And when using "start" it's not always easy to redirect console output to a file with any certainty (you may wind up redirecting the output of your "start" command and not the program you're actually starting).

See the dilemma? I appreciate that GMP-ECM is faster, but for a simple fella like myself it would take far too much effort to actually use it for more than just tinkering. Then there's the issue of getting some kind of result out of it that could be fed into Primenet. In theory it's not that hard to output a text line similar to what mfaktc does. All the manual result page does is parse the text for the relevant info, and as George said, ECM results are accepted on the honor system.

Lest this come across as mere complaining, I'll be more specific in what would be nice to see. Can GMP-ECM, at least the Win64 compiled version, do these things:[LIST][*]a switch to output text to a file instead of (or as well as) the screen[*]a switch to set CPU affinity for the process[*]and considering that experienced users know about optimizing things, could it automatically pick optimal bounds for the type of work?[/LIST]
Ideally it'd be cool if Prime95 used the faster code that GMP-ECM apparently uses so there's not the back and forth shuffle in the first place but I know George is busy. His list of "nice to have" changes in Prime95 is probably long already. :smile:

lorgix 2015-04-22 18:38

<factorme.txt >> log.txt
is how I log results.
-n sets low priority.
Optimal bounds depend on many different things. Luckily, the efficiency curve is pretty flat around the optimal.
Adjusting bounds based on experience is not hard.

VBCurtis 2015-04-22 19:45

[QUOTE=Madpoo;400653[*]and considering that experienced users know about optimizing things, could it automatically pick optimal bounds for the type of work?[/LIST]
Ideally it'd be cool if Prime95 used the faster code that GMP-ECM apparently uses so there's not the back and forth shuffle in the first place but I know George is busy. His list of "nice to have" changes in Prime95 is probably long already. :smile:[/QUOTE]

How would we tell GMP-ECM what the type of work is, such that it could make a decision about optimal bounds? In present use, we supply it an input number and a B1 chosen to most quickly find factors of a certain size. We choose B1 based on how much previous factoring effort was done. To get GMP-ECM to do that, we would have to somehow have the input file include all previous factoring work, while also adding quite a bit of code to determine what B1 now makes sense for the memory available, size of composite, and size of hoped-for-factor. That would be pretty complicated, though comes fairly naturally to users of the program after some experience. YAFU and ecm.py both automate some of these choices- you can tell YAFU what digit-level of ECM has already been run, and to what digit level you wish to go and how many cores to use for ECM, and it automagically chooses B1/B2 bounds and fires up the proper number of ecm.exe processes.

I agree that it would be quite nice for ECM work overall for Prime95 to be able to call ecm directly for stage 2! Alas, a very small use case compared to the overall project.

lycorn 2015-04-23 13:49

@Madpoo:

The following considerations assume you will run Stage 1 on Prime95 and Stage 2 on GMP-ECM.

Make a copy of Prime95 directory, and configure the program for a single worker. You´ll be running the Stage 1 from there.
Insert the line GmpEcmHook=1 in prime.txt.
Run P95 with B1=B2=the applicable B1default bound for the given exponent (for 1277, B1=8e08). You may later adjust this value based on experience. Run as many curves as you wish
Upon finishing, the results.txt file from P95 will contain a bunch of Stage 1 residues, one from each curve.
Copy the results.txt file to the GMP-ECM home directory, open a cmd prompt and run ecm -v -resume results.txt -save <choose a file name> 8e08-8e08. You may as well keep both executables and associated files in the same directory. It will save you the hassle of moving files around.
GMP-ECM will sweep through the residues file, running the Stage 2 for each one, and recording the residues in the file chosen for saving (GMP-ECM creates the file, you just supply the name).
Upon finishing, the save file will contain the Stage 2 residues.
To report the results to the server, contact GW - he may get you sorted out.
To set the affinity, I use the task manager. It´s really not a must to set the affinity, but it keeps ecm from stealing resources from the lower priority workers - I do it because when running ecm on one core and P95 on the remaining ones, if I don´t stick ecm to the idle core, the CPU usage of the P95 workers decreases.
I appreciate it´s a difference between running this combo of programs at home, on a single desktop computer, or in a datacenter environment, where the abilty to automate tasks is a must. But anyway, if you want to give it a shot, there´s how I do it. As there is some manual work involved, you may wish to do long runs - jobs that take a couple of days on each Stage - to reduce the manual overhead. Try using much larger bounds, as suggested by VBCurtis, and see what it gives in terms of running time and memory usage. It would be very nice to put all that memory to good use!
See the Readme file that comes with the GMP-ECM package for more info.
HTH

Madpoo 2015-04-23 16:33

[QUOTE=lycorn;400703]@Madpoo:

The following considerations assume you will run Stage 1 on Prime95 and Stage 2 on GMP-ECM.

Make a copy of Prime95 directory, and configure the program for a single worker. You´ll be running the Stage 1 from there.
Insert the line GmpEcmHook=1 in prime.txt.
Run P95 with B1=B2=the applicable B1default bound for the given exponent (for 1277, B1=8e08). You may later adjust this value based on experience. Run as many curves as you wish
Upon finishing, the results.txt file from P95 will contain a bunch of Stage 1 residues, one from each curve.
Copy the results.txt file to the GMP-ECM home directory, open a cmd prompt and run ecm -v -resume results.txt -save <choose a file name> 8e08-8e08. You may as well keep both executables and associated files in the same directory. It will save you the hassle of moving files around.
GMP-ECM will sweep through the residues file, running the Stage 2 for each one, and recording the residues in the file chosen for saving (GMP-ECM creates the file, you just supply the name).
Upon finishing, the save file will contain the Stage 2 residues.
To report the results to the server, contact GW - he may get you sorted out.
To set the affinity, I use the task manager. It´s really not a must to set the affinity, but it keeps ecm from stealing resources from the lower priority workers - I do it because when running ecm on one core and P95 on the remaining ones, if I don´t stick ecm to the idle core, the CPU usage of the P95 workers decreases.
I appreciate it´s a difference between running this combo of programs at home, on a single desktop computer, or in a datacenter environment, where the abilty to automate tasks is a must. But anyway, if you want to give it a shot, there´s how I do it. As there is some manual work involved, you may wish to do long runs - jobs that take a couple of days on each Stage - to reduce the manual overhead. Try using much larger bounds, as suggested by VBCurtis, and see what it gives in terms of running time and memory usage. It would be very nice to put all that memory to good use!
See the Readme file that comes with the GMP-ECM package for more info.
HTH[/QUOTE]

Thanks, that's good advice. That's pretty much what I ended up doing, although I thought the -save option wouldn't save the output it shows... maybe it doesn't exactly but it would save the important stuff.

It's probably worth setting the affinity anyway even if you don't also have Prime95 running, just because Windows will switch you around to different cores at whim and you'll lose any benefit of the core caching. There'd be a shared L3 cache which won't matter but the L1/L2 caching could have a benefit. And if you have a multi-socket system then switching between NUMA nodes would be detrimental.

I started a run last night with the output of some Prime95 stage 1 results. I have a file of 24 curves of M1277 with B1=29e8.

Feeding those into GMP-ECM tells me that stage 2 will use an estimated 17GB, although I see ecm.exe using nearly 19 GB currently. Fortunately this dev system is only using 110 of it's 144 GB right now (including that 19 GB of ECM). I guess I could run another one. :)

Here's the output of one of the stage 2 runs... maybe you can help figure out if everything looks okay. I was a little confused about the input number being 0x1FFF... but it does say it's doing "special division for factor of 2^1277-1"

One of the main takeaways there is stage 2 took 5754 seconds (let's call it 96 minutes). I guess that's okay for a 29e8 B1 for stage 2? The CPU is one of the cores of an X5690 @ 3.47 GHz. It's done 6 of the 24 so far and that timing is pretty consistent. Longest one was 5815 seconds, not that much off from the 5754 of the quickest one.

[CODE]GMP-ECM 6.4.4 [configured with MPIR 2.6.0] [ECM]
Resuming ECM residue saved with Prime95
Input number is 0x1FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF (385 digits)
Using special division for factor of 2^1277-1
Using B1=2900000000-2900000000, B2=105101237217912, polynomial Dickson(30), sigma=3389447693745215
dF=2097152, k=2, d=23130030, d2=13, i0=113
Expected number of curves to find a factor of n digits:
35 40 45 50 55 60 65 70 75 80
10 28 89 309 1175 4842 21459 102212 513971 2730842
Step 1 took 15ms
Using 44 small primes for NTT
Estimated memory usage: 17G
Initializing tables of differences for F took 8703ms
Computing roots of F took 687282ms
Building F from its roots took 623860ms
Computing 1/F took 229328ms
Initializing table of differences for G took 2421ms
Computing roots of G took 758954ms
Building G from its roots took 619859ms
Computing roots of G took 777922ms
Building G from its roots took 621860ms
Computing G * H took 120859ms
Reducing G * H mod F took 123829ms
Computing polyeval(F,G) took 1162766ms
Computing product of all F(g_i) took 2953ms
Step 2 took 5753906ms
Expected time to find a factor of n digits:
35 40 45 50 55 60 65 70 75 80
16.51h 1.89d 5.92d 20.59d 78.24d 322.47d 3.92y 18.65y 93.78y 498.26y
[/CODE]

lycorn 2015-04-23 17:26

I think it looks just fine. Pretty much what I would expect.
The "weird" string of 1´s is actually your input number (2^1277-1) in binary, 385 digits long.
The memory use is consistent with the values I get in my system. Using B1=8e08, the estimated mem usage is 4049 MB, but the during the run the usage fluctuates along the progress of the computation, and may use up to 4600 MB. So estimating 17GB and using up to 19 seems OK.
The times also seem reasonable, compared with the ones on my system (much better than mines, actually...).
All in all, I think you´re doing pretty well. Hope you´ll find a factor soon... I´m not making fun, just being optimistic :wink: It´s good to get help from dream machines like the ones you use.

One last note: Comparing the number of curves estimated by P95 alone to find a 65~digit factor (360,000) with the number estimated by the combo you´re using (21,459) shows how much more effective your setting is, even noting that each of "your" curves takes longer to run.

Madpoo 2015-04-23 18:15

[QUOTE=lycorn;400722]I think it looks just fine. Pretty much what I would expect.
The "weird" string of 1´s is actually your input number (2^1277-1) in binary, 385 digits long.[/QUOTE]

Doh! I should have known that. All the FF's should have tipped me that it was the 2^x-1 at work.

[QUOTE=lycorn;400722]
The memory use is consistent with the values I get in my system. Using B1=8e08, the estimated mem usage is 4049 MB, but the during the run the usage fluctuates along the progress of the computation, and may use up to 4600 MB. So estimating 17GB and using up to 19 seems OK.
The times also seem reasonable, compared with the ones on my system (much better than mines, actually...).
All in all, I think you´re doing pretty well. Hope you´ll find a factor soon... I´m not making fun, just being optimistic :wink: It´s good to get help from dream machines like the ones you use.

One last note: Comparing the number of curves estimated by P95 alone to find a 65~digit factor (360,000) with the number estimated by the combo you´re using (21,459) shows how much more effective your setting is, even noting that each of "your" curves takes longer to run.[/QUOTE]

Thanks for the feedback. It's helpful to know I'm not doing something totally stupid beforehand in case I do throw some resources at this here and there.

I'm still doing triple-checks on self-verified LL runs and I figure this might be another fun mini-project once that's out of the way.

lorgix 2015-04-23 19:43

[QUOTE=Madpoo;400728]Thanks for the feedback. It's helpful to know I'm not doing something totally stupid beforehand in case I do throw some resources at this here and there.

I'm still doing triple-checks on self-verified LL runs and I figure this might be another fun mini-project once that's out of the way.[/QUOTE]
See post #16 in this thread for how to optimize the bounds. You will arrive at a higher B2.
Use -maxmem to limit RAM usage if you want to.

Madpoo 2015-04-25 17:57

[QUOTE=Madpoo;400653]If I have a system with 12 cores, it's very easy to have Prime95 manage all the work with a nice, tidy worktodo file, and I can set the affinity so each worker is using it's own core.[/QUOTE]

I managed to script out the finer points of running multiple "ecm.exe" processes at once.

Notably, on the 12-core (2x6 core) system I'm testing around on, I wanted to run 12 instances of ECM at once, each with affinity to a specific core, running in "Idle" priority, and logging it's output to it's own file.

The powershell command to set affinity/priority on a running process would have failed if there were multiple processes with the same name. Solution: make copies of "ecm.exe" named "ecm1.exe" through "ecm12.exe". Done. :smile:

To launch ECM itself, I kick it off in it's own command window:
start /min cmd /c ecm%corenum%.exe -v -c %curves% -inp %infile% %b1% > %infile%.out

In the batch file I set corenum, curves, b1 and infile to whatever (the "infile" might simply be named "1277" and contains "2^1277-1")

It kicks off a command console in it's own minimized window, running that specifically named exe file.

Then I have to pause about a second (do this however... I use the command line replacement "TCMD" from JPSoft which has a "delay" command). That gives the exe time to start up before the next step.

That next step is to run a simple Powershell set of commands:
PowerShell "$Process = Get-Process ecm%corenum%; $Process.ProcessorAffinity=%mask%; $Process.PriorityClass = 'Idle'"

For that to work you would need to set the %mask% variable to bit masked affinity to have "corenum" run on a specific one.

For my Windows system with hyperthreading enabled, cpus 1 and 2 are the physical and HT of one core, 3 and 4 are the next, etc.

So if corenum=1 then I'd want the mask to be 0x1, corenum=2 would be 0x4, corenum=3 is 0xF, etc. Going back to my preference for TCMD as a command replacement, it's easy to get the decimal mask (which powershell can use) with this little thing:
set mask=%@eval[1 shl 2*(%corenum%-1)]

(just does a shift left of 0x1 by the corenum-1, and then times 2 since I skip over the HT cores).

Or you can do some "if %corenum%==5 set mask=256" things to keep it in the realm of "cmd.exe" compatible.

It's a little Rube Goldberg'ish but it works.

For doing stage 2 work where Prime95 did the stage 1, I could work that in as well but because of the memory usage I could probably only run 1 or maybe 2 on a machine at once anyway, and at that point it's fine to just manually set the affinity/priority as I feed it a list of a couple hundred stage 1 curves to finish and leave it alone.

lycorn 2015-04-26 01:14

[QUOTE=Madpoo;400909]

For doing stage 2 work where Prime95 did the stage 1, I could work that in as well but because of the memory usage I could probably only run 1 or maybe 2 on a machine at once anyway, and at that point it's fine to just manually set the affinity/priority as I feed it a list of a couple hundred stage 1 curves to finish and leave it alone.[/QUOTE]

Doing Stage 1 with Prime95, at least for these very small exponents, is definitely the best shot as Prime95 is a lot faster than GMP-ECM for S1. Feeding GMP-ECM with a large number of P95 S1 curves and forgetting it for a while renders the overhead negligible. Well, sort of...

R.D. Silverman 2015-04-27 14:01

[QUOTE=lycorn;400940]Doing Stage 1 with Prime95, at least for these very small exponents, is definitely the best shot as Prime95 is a lot faster than GMP-ECM for S1. Feeding GMP-ECM with a large number of P95 S1 curves and forgetting it for a while renders the overhead negligible. Well, sort of...[/QUOTE]

I am curious. How much faster is P95 than GMP-ECM for S1 for Mersenne/Wagstaff numbers?
If one turns on the fast modular reduction for 2^n-1 within GMP-ECM, I would think that it would
be very fast....

I agree that P95 would/should be faster for large exponents (e.g. exponents greater than say 10^5).

Madpoo 2015-04-27 15:42

[QUOTE=lycorn;400940]Doing Stage 1 with Prime95, at least for these very small exponents, is definitely the best shot as Prime95 is a lot faster than GMP-ECM for S1. Feeding GMP-ECM with a large number of P95 S1 curves and forgetting it for a while renders the overhead negligible. Well, sort of...[/QUOTE]

Over the weekend I had a few machines running Prime95 doing just stage 1 at b1=29e8 for M1277. I spit out 240 curves between the machines using P95 just for the stage 1.

Now I took those 240 results and between these same systems I can run 10 at a time without causing memory issues (I'm assuming 20 GB per instance). Each instance will process 24 of the results. In detail, it's 4 machines (2x6 core Xeons). 3 of the 4 had enough extra RAM to run 2 at once and one of them was only using 50GB out of 144 GB so I'm giving it 4 to run at once. Too bad that machine has slightly slower CPU's (2.53 GHz instead of 3.47 GHz...oh well).

I just slightly tweaked the process I put together for having gmp-ecm do everything... mostly I didn't want to click around a bunch to set affinity and priority, and I only had to modify it slightly to grab a particular results file.

Anyway, those are all going now. I can't remember what it was last time, but I think it was taking something like 90 minutes or so for it to run stage 2 on each result. I guess in 36 hours I should have all 240 curves finished with stage 2.

BTW, I just let gmp-ecm figure out B2 (it's using B2=105101237217912).

Madpoo 2015-04-27 15:47

[QUOTE=Madpoo;401028]Anyway, those are all going now. I can't remember what it was last time, but I think it was taking something like 90 minutes or so for it to run stage 2 on each result. I guess in 36 hours I should have all 240 curves finished with stage 2.[/QUOTE]

It occurred to me that my previous tests were just running one per machine. Doing 2 (or 4) on the same system may create memory contention well beyond what I saw before and slow down the individual threads. I guess I'll find out.

I guess I could try setting the affinity on multiple threads to different NUMA nodes, but I don't even know if that would help. When the process started up it didn't have an affinity so the Windows scheduler (which is NUMA aware) probably didn't know to try and give it memory on those channels?

One more reason the affinity setting *should* be baked into the EXE so it's defined at launch.

Madpoo 2015-04-27 18:15

[QUOTE=Madpoo;401029]It occurred to me that my previous tests were just running one per machine. Doing 2 (or 4) on the same system may create memory contention well beyond what I saw before and slow down the individual threads. I guess I'll find out./QUOTE]

It seems like it might be just fine.

On the faster 3.47 GHz systems it's still doing about 94 minutes to do the stage 2 on each one, which is about the same as before when only one was running (this one has 2 going). I did manually change the affinity so each instance was on it's own NUMA node which may have been a good idea.

On the slower 2.53 GHz systems which I didn't benchmark previously it's doing each stage 2 in about 128 minutes.

2.53 GHz / 3.47 GHz = 0.73
94 minutes / 128 minutes = 0.73

So yeah, seems like it scales pretty evenly with CPU speed, everything else being equal (similar servers, similar mem speed, same Xeon class CPU, etc)

lycorn 2015-04-28 00:16

Well, that´s what I call a powerhouse! Those 240 curves will end up counting as probably more than 1000 @ default/current bounds set in the Primenet server.

Have you measured the time each curve is taking on S1? According to some posts in this thread, the ratio S1:S2 should be ~ 1:0.7. So if S2 is taking 94 minutes and S1 more than 94/0.7=134 mins you could ftry larger B2 values until you attain that ratio.

lycorn 2015-04-28 00:34

[QUOTE=R.D. Silverman;401017]I am curious. How much faster is P95 than GMP-ECM for S1 for Mersenne/Wagstaff numbers?
If one turns on the fast modular reduction for 2^n-1 within GMP-ECM, I would think that it would
be very fast....

I agree that P95 would/should be faster for large exponents (e.g. exponents greater than say 10^5).[/QUOTE]

I´ve timed some P95 and GMP-ECM runs, and in general P95 S1 is faster than GMP-ECM´s. I still have just a few data points so the results aren´t yet that meaningful, I will try to find some time to run some more, and also comparing, for the same expo size, different B1 values. I will then post some more robust conclusions.
It´s already becoming apparent, though, that for smaller exponents the difference is not as big as for larger ones, in line with what you wrote in your post.

Madpoo 2015-04-28 03:30

[QUOTE=lycorn;401067]Well, that´s what I call a powerhouse! Those 240 curves will end up counting as probably more than 1000 @ default/current bounds set in the Primenet server.

Have you measured the time each curve is taking on S1? According to some posts in this thread, the ratio S1:S2 should be ~ 1:0.7. So if S2 is taking 94 minutes and S1 more than 94/0.7=134 mins you could ftry larger B2 values until you attain that ratio.[/QUOTE]

Pppphhhbbth. :smile:

The server that was taking 128 minutes in stage 2 with gmp-ecm was taking 5 hours, 45 minutes in stage 1 with P95.

Sounds like you're saying that I should really be goosing up B2 until it takes around 4 hours in stage 2?

VBCurtis 2015-04-28 04:14

[QUOTE=Madpoo;401076]Pppphhhbbth. :smile:

The server that was taking 128 minutes in stage 2 with gmp-ecm was taking 5 hours, 45 minutes in stage 1 with P95.

Sounds like you're saying that I should really be goosing up B2 until it takes around 4 hours in stage 2?[/QUOTE]

Using the -B2scale 4 flag when calling stage 2 GMP-ECM will multiply B2 by four, which will double memory requirement and double stage 2 time. If your serve can handle that memory load, that will be more efficient (though only 5% or so more efficient than your current default settings- not an "OMG must do!" issue).

Edit: It's less clear that increasing B2 without increasing memory (which increases the number of steps to finish stage 2, a parameter GMP-ECM calls "k") will prove more efficient. If you are memory-limited to this current stage 2 footprint, it may be that only a small increase in B2 is worthwhile. B2 increases in large steps, corresponding to a unit change in k-value. If your current test uses k=3, the next B2 would be 1/3rd bigger and k=4 for same memory footprint. GMP-ECM by default uses k values 2 through 6, followed by a doubling of memory and reset to k=2 for a bigger more efficient work-chunk. If you set maxmem={number too small for default k choice}, the program will stick to the smaller work-chunk-size, increasing k beyond 6. This is usually less efficient, but experimentation is required (depends on individual machine specs).

lorgix 2015-04-28 06:08

I'm pretty sure B2 should be increased, even with a memory constraint.

R.D. Silverman 2015-04-28 10:44

[QUOTE=Madpoo;401076]Pppphhhbbth. :smile:

The server that was taking 128 minutes in stage 2 with gmp-ecm was taking 5 hours, 45 minutes in stage 1 with P95.

Sounds like you're saying that I should really be goosing up B2 until it takes around 4 hours in stage 2?[/QUOTE]

If people would ever bother to READ my joint paper with Sam Wagstaff, they would learn that
optimal ECM performance is obtained when one spends the same amount of TIME in step 1
and step 2.

lycorn 2015-04-28 12:09

According to posts #16, 22 and 32 of this thread, written by someone that apparently has read your papers, the ratio is 1:0.7, hence my observation. I lack the math background to fully understand what´s involved, so I trusted what seemed to come from a reliable source. I´m obviously happy to be corrected from someone qualified in the subject as yourself.

ATH 2015-04-28 12:33

[QUOTE=R.D. Silverman;401017]I am curious. How much faster is P95 than GMP-ECM for S1 for Mersenne/Wagstaff numbers?
If one turns on the fast modular reduction for 2^n-1 within GMP-ECM, I would think that it would
be very fast....

I agree that P95 would/should be faster for large exponents (e.g. exponents greater than say 10^5).[/QUOTE]

Prime95 and GMPECM running on 1 core each, stage1 only:

[CODE]M1277 B1=110M Prime95:16 min GMPECM: 27 min

M1277 B1=44M Prime95:6.5min GMPECM: 9.5 min

M2137 B1=44M Prime95:8.5min GMPECM: 29 min

M10061 B1=44M Prime95:34.5min GMPECM: 277 min[/CODE]

lorgix 2015-04-28 12:49

[QUOTE=R.D. Silverman;401092]If people would ever bother to READ my joint paper with Sam Wagstaff, they would learn that
optimal ECM performance is obtained when one spends the same amount of TIME in step 1
and step 2.[/QUOTE]

[QUOTE=lycorn;401094]According to posts #16, 22 and 32 of this thread, written by someone that apparently has read your papers, the ratio is 1:0.7, hence my observation. I lack the math background to fully understand what´s involved, so I trusted what seemed to come from a reliable source. I´m obviously happy to be corrected from someone qualified in the subject as yourself.[/QUOTE]
I believe the paper arrives at the conclusion that stage2 should take 0.41 units of time when stage1+stage2 takes 1 unit of time.
Meaning stage1:stage2::0.59:0.41 or 1:0.7.

Madpoo 2015-04-28 14:51

[QUOTE=R.D. Silverman;401092]If people would ever bother to READ my joint paper with Sam Wagstaff, they would learn that
optimal ECM performance is obtained when one spends the same amount of TIME in step 1
and step 2.[/QUOTE]

Bah, reading is for chumps. I like to reinvent wheels. :smile:

On the serious side, that is useful, as is all the discussion on optimizing gmp-ecm. In case it's not painfully obvious this is the first time I've used gmp-ecm for anything at all and I'll freely admit I skimmed the basics before diving in. Mostly because it's just a curiousity on my part, at this point anyway.

VBCurtis 2015-04-28 15:02

[QUOTE=R.D. Silverman;401092]If people would ever bother to READ my joint paper with Sam Wagstaff, they would learn that
optimal ECM performance is obtained when one spends the same amount of TIME in step 1
and step 2.[/QUOTE]

Your paper does not seem to say this, at all. On page 6, the paper claims that for a wide range of K proportions of stage 2 to stage 1 speed, choosing B2 = 0.4K*B1 is optimal. That choice would result in stage 2 taking 40% as long as stage 1. It uses K = 100 and B2= 41*B1 as a specific example. If stage 2 runs 100 times faster than stage 1, and we run stage 2 from B1 to 41*B1, we are spending 40% of the stage 1 time on stage 2, correct?

Since I am able to read, and have done so, please point me to where in your paper you contradict the conclusion I have summarized above. Also, note that empirical testing with current GMP-ECM versions support using 40% of stage 1 time for stage 2, with the flat-topped peak you mentioned in the paper (that is, it doesn't much matter if stage 2 is 35% or 45%).

Madpoo 2015-04-28 15:14

[QUOTE=ATH;401096]Prime95 and GMPECM running on 1 core each, stage1 only:

[CODE]M1277 B1=110M Prime95:16 min GMPECM: 27 min

M1277 B1=44M Prime95:6.5min GMPECM: 9.5 min

M2137 B1=44M Prime95:8.5min GMPECM: 29 min

M10061 B1=44M Prime95:34.5min GMPECM: 277 min[/CODE][/QUOTE]

Interesting.

I'm still convinced that it would be awesome if either:[LIST][*]Prime95 was updated so it's stage 2 ran better (could use more memory, code enhancements, whatever)[*]GMP-ECM was updated to run stage 1 faster[/LIST]
It's just a little unfortunate that we have a case where 2 programs both do one thing exceedingly awesome, but not the other, so if we want peak performance we're using both.

I'd still prefer to see Prime95 updated because a) it's easy to run multiple workers, b) results are easily checked in to Primenet, and c) easier to mix work types on one system with multiple cores

That said, I wouldn't be opposed to using gmp-ecm entirely if it were easier to a) set affinity (and priority should be idle by default, like Prime95), b) generate results that are compatible with Prime95 results that can be manually checked in

I don't mind that gmp-ecm (ecm.exe in particular) is command line... kind of makes it easier to manage multiple systems in that way I guess. So I'm thinking more in line of an "-affinity 0x001" type of thing where it's a basic affinity mask, or since it doesn't really multithread, make it even easier on the user by just "-affinity 4" would run on core #4, so the user doesn't have to figure out the mask. Kind of like the "Affinity=x" in Prime95 "local.txt".

The output of each curve could be boiled down to a compatible P95 results line like:
M1277 completed 1 ECM curve, B1=2900000000, B2=105101237217912

Without a checksum like Prime95 would generate, the server won't accept it, but I could imagine we'd be able to work something out so results from gmp-ecm could be checked in automatically... without emailing George. :)

Like, I guess right now when all 240 of my stage 2's are done, I'd let George know I did 240 curves with those bounds?

Madpoo 2015-04-28 15:40

[QUOTE=VBCurtis;401077]Using the -B2scale 4 flag when calling stage 2 GMP-ECM will multiply B2 by four, which will double memory requirement and double stage 2 time. If your serve can handle that memory load, that will be more efficient (though only 5% or so more efficient than your current default settings- not an "OMG must do!" issue).

Edit: It's less clear that increasing B2 without increasing memory (which increases the number of steps to finish stage 2, a parameter GMP-ECM calls "k") will prove more efficient. If you are memory-limited to this current stage 2 footprint, it may be that only a small increase in B2 is worthwhile. B2 increases in large steps, corresponding to a unit change in k-value. If your current test uses k=3, the next B2 would be 1/3rd bigger and k=4 for same memory footprint. GMP-ECM by default uses k values 2 through 6, followed by a doubling of memory and reset to k=2 for a bigger more efficient work-chunk. If you set maxmem={number too small for default k choice}, the program will stick to the smaller work-chunk-size, increasing k beyond 6. This is usually less efficient, but experimentation is required (depends on individual machine specs).[/QUOTE]

I'm doing a timing run of stage 1 on M1277 with B1=2900000000, B2=425327623620922 (with the -B2scale 4 you mentioned). First I'm curious to do my own timing of stage 1 to see how it compared with Prime95...similar system as before with a 2.53 GHz core. Then I'm curious to see how the memory usage goes.

I have 3 of those running on a system right now. Stage 2 should use 34-40 GB per instance (this is a passive cluster node... lots of memory and nothing to do most of the time).

So this will tell me a few things:
1) The difference between stage 1 times on a system like that with P95 and gmp-ecm
2) About how much actual memory stage 2 will take (it said 34 GB but I'm guessing more like 38GB
3) What the stage 1/stage 2 ratio of times would be like purely within gmp-ecm

If the time difference on stage 1 isn't dramatically different than 5 hours, 45 minutes like Prime95 took, I may overlook that in favor of convenience. I could "set and forget" gmp-ecm to run 1000 curves over some period of weeks and forget about it until I check the output. Otherwise it would require regular care and feeding, and who has time for that? :smile:

Madpoo 2015-04-28 16:53

I thought I'd poke my head into the Prime95 source, specifically the ECM routine.

I'm glancing through ecm.c and, first off, I am not a programmer so most of it I'm skimming.

I saw this in the stage 2 section:
[QUOTE]/* Process stage 2 in chunks if the bit array will be really large. */
/* By default, the bit array is limited to 250MB. Remember each byte */
/* corresponds to 8 odd numbers which is a range of 16. */[/QUOTE]

That might explain why stage 2 always seems to only use a limited amount of memory, unlike what P-1 is capable of doing.

The next bit of code has a reference to a config setting I've not seen before, "MaximumBitArraySize". Maxes out at 2000, so I guess you're only going to be able to use up to 2 GB. Is that an old 32-bit OS limitation? Any reason it couldn't be set higher?

The next section of code seems to alter the ending point value based on that bit array size:
[CODE]if ((pm1data->C - pm1data->C_start) / 1000000 / 16 > max_bitarray_size)
pm1data->C = pm1data->C_start + max_bitarray_size * 1000000 * 16;[/CODE]

It would force stage 2 to run in smaller chunks... I guess like the "-k n" option in gmp-ecm.

This may sound dumb of me but could one reason Prime95 does stage 2 slower than gmp-ecm be due to it's low cap on memory? I may test this theory by comparing stage 2 runs on P95 with gmp-ecm and the "-maxmem 250" option. I'm also curious to try adding "MaximumBitArraySize=2000" to my prime.txt and see what happens. Would it use more RAM in stage 2? Who knows...stay tuned. :smile:

Are there other significant differences in the underlying code itself besides the amount of memory they'll use?

axn 2015-04-28 17:23

[QUOTE=Madpoo;401112]Are there other significant differences in the underlying code itself besides the amount of memory they'll use?[/QUOTE]
Yes. GMP-ECM uses a O(sqrt(B2)) algorithm while P95 uses O(B2) algorithm for stage 2.

Madpoo 2015-04-28 21:51

[QUOTE=Madpoo;401104]...
The output of each curve could be boiled down to a compatible P95 results line like:
M1277 completed 1 ECM curve, B1=2900000000, B2=105101237217912

Without a checksum like Prime95 would generate, the server won't accept it, but I could imagine we'd be able to work something out so results from gmp-ecm could be checked in automatically... without emailing George. :)
[/QUOTE]

Well, I'm not so sure now. I see results checked in from people that are just like that. Mine didn't work when I pasted it, although I did use "curve" instead of "curves". I got an error about a missing checksum. :)

Maybe only certain trusted users are allowed to add entries like that. Might make sense...wouldn't want just anyone pasting in whatever?

EDIT: Yes, that's it, I found out why my submissions fail but a certain other trustworthy user is also allowed to do so. If I continue crunching away at M1277 or any other ECM work, I could petition George to be included. I think he trusts me. :smile:

R.D. Silverman 2015-04-28 23:10

[QUOTE=VBCurtis;401103]Your paper does not seem to say this, at all. On page 6, the paper claims that for a wide range of K proportions of stage 2 to stage 1 speed, choosing B2 = 0.4K*B1 is optimal. .[/QUOTE]

The mathematical analysis showed that if B2 and B1 were chosen optimally, that
[tex] \partial B2/\partial B1 = K[/tex] where step 2 was K times as fast as step 1.


Thus, if B1 changes to B1 + x, then B2 needs to change to B2 + Kx to remain
at optimality. This in turn implies that step 2 should take as long as step 1 to
remain at optimality.

R.D. Silverman 2015-04-28 23:17

[QUOTE=R.D. Silverman;401167]The mathematical analysis showed that if B2 and B1 were chosen optimally, that
[tex] \partial B2/\partial B1 = K[/tex] where step 2 was K times as fast as step 1.


Thus, if B1 changes to B1 + x, then B2 needs to change to B2 + Kx to remain
at optimality. This in turn implies that step 2 should take as long as step 1 to
remain at optimality.[/QUOTE]

Let me add that as the B2/B1 ratio changes for a particular implementation then step 2 does not REMAIN
K times as fast uniformly over the range [B1, B2]. The ".4K" was for a particular
implementation on a SUN-3 computer and determined by experiment. If the machine
had much much more memory, the ".4" coefficient would have become larger. If enough
memory is available so that step 2 need not be done in segments, and one uses a "perfect"
implementation of a convolution based Step 2, then one should always choose B2 so that
step 1 and step 2 take the same time.

VBCurtis 2015-04-29 01:29

[QUOTE=R.D. Silverman;401168]Let me add that as the B2/B1 ratio changes for a particular implementation then step 2 does not REMAIN
K times as fast uniformly over the range [B1, B2]. The ".4K" was for a particular
implementation on a SUN-3 computer and determined by experiment. If the machine
had much much more memory, the ".4" coefficient would have become larger. If enough
memory is available so that step 2 need not be done in segments, and one uses a "perfect"
implementation of a convolution based Step 2, then one should always choose B2 so that
step 1 and step 2 take the same time.[/QUOTE]

If 0.4K was determined by experiment on that implementation, why wouldn't we determine experimentally the correct ratio for the current (non-perfect implementation) GMP-ECM? Why do you insist on badgering everyone that stage 2 time should be equal to stage 1 time, when your own paper determined stage 2 time = 40% of stage 1 time was optimal on that machine, and our present software is also generally not optimally efficient with stage 1 time = stage 2 time?

I mean, I appreciate that with perfect software and no memory constraint you are correct, but we do not have such software at our disposal. So, why would we choose your mathematically correct but in practice (with current software and memory systems) less efficient ratio?

I have done empirical studies with GMP-ECM and composites around 200 digits, and found that stage 2 time of 40 to 50% of stage 1 time is quite a bit more efficient than stage 2 time equal to stage 1 time. It appears your paper supports the possibility that my experiments were not in error, and also that your point can be correct in principle yet not for our software. I am trusting the number of curves needed for a (say) t50 GMP-ECM calculates with the "-v" flag for each variant of B1/B2, as I believe that is coded based on your paper.

If you agree this is possible, please stop telling us to choose B2 to make stage 2 time equal to stage 1 time. Every experiment I have run says this is bad advice with current software. If you disagree, please run your own experiments and post data.

Madpoo 2015-04-29 02:50

[QUOTE=Madpoo;401112]This may sound dumb of me but could one reason Prime95 does stage 2 slower than gmp-ecm be due to it's low cap on memory? I may test this theory by comparing stage 2 runs on P95 with gmp-ecm and the "-maxmem 250" option. I'm also curious to try adding "MaximumBitArraySize=2000" to my prime.txt and see what happens. Would it use more RAM in stage 2? Who knows...stay tuned. :smile:[/QUOTE]

Well, interesting...

gmp-ecm with "-maxmem 250" performs horribly.

I fed it one of the same stage 1 outputs from Prime95 that I used before... B1=29e8 and autoselected B2. It finished up stage 2 in I think 128 minutes when it could gobble up as much RAM as it wanted.

With only being able to use 250 MB, it chose a much smaller B2 (80847864213490). It's been going now for several hours (5 or 6? I lost track) and it's still going.

On the other hand, with Prime95 I didn't see any difference in memory usage for stage 2 when I set that MaximumBitArraySize=2000. With B1=800000000 and B2=80000000000 it still does stage 1 in a reasonable time but then stage 2 didn't complete any faster than it had before and still only uses 207 MB of RAM.

Just for fun I tried doing a stage 2 with B1=29e8 and B2=425327623620922 which is what I have three threads doing right now (and using 37 GB per thread with gmp-ecm). I'm still waiting to see just how long stage 2 will take with that "-B2scale 4" option. Definitely longer than the 128 minutes from before but we'll see. I need to refresh my memory on how long the stage 1's were taking in Prime95 on different machines, see if they're closer now. If memory serves, it was 5 hour, 45 minutes on a similar system so if this stage 2 finishes pretty soon it'll be closer to that.

Umm... let's just say that stage 1 took as long as it has before when I was doing just stage 1 and the ecm hook. Stage 2 goes VERY slow. I think it was estimating completion of that single curve in somewhere around October. I didn't bother. It did start using a lot more memory at those bounds than I've ever seen it use with smaller ones... 18.8 GB for a single worker. Still terribly terribly slow though.

Madpoo 2015-04-29 03:00

[QUOTE=VBCurtis;401182]I mean, I appreciate that with perfect software and no memory constraint you are correct, but we do not have such software at our disposal. So, why would we choose your mathematically correct but in practice (with current software and memory systems) less efficient ratio?[/QUOTE]

Everything else considered, it may also bear noting that trying to optimize the ratio not just between stage 1/stage 2 but also between 2 totally different programs is going to cause confusion, ultimately. I just don't feel like it's totally apples to apples anyway.

Not to mention the "human factor" introducing inefficiency by having to digitally shuffle files around, launching different programs, etc. Heck, you're going to lose however much time just doing that. :smile:

Times like this I wish I'd spent a little more time in my programming classes so I could just hack the best parts of the ECM code into one or the other. LOL

axn 2015-04-29 03:14

[QUOTE=Madpoo;401191]Times like this I wish I'd spent a little more time in my programming classes so I could just hack the best parts of the ECM code into one or the other. LOL[/QUOTE]

There is (or at least there used to be) a provision in GMP-ECM to incorporate gwnum libraries to speed up stage 1 for Mersenne numbers. if you do "ecm -princonfig" you can see reference to GWNUM_VERSION. Since then, gwnum's scope has been expanded to support the general form (k*b^n+c)/f. So I guess it is time to revisit this integration.

ATH 2015-04-29 11:15

Anyone can compile GMPECM with the --with-gwnum feature? I cannot get it to work.

R.D. Silverman 2015-04-29 12:41

[QUOTE=VBCurtis;401182]If 0.4K was determined by experiment on that implementation, why wouldn't we determine experimentally the correct ratio for the current (non-perfect implementation) GMP-ECM?
[/QUOTE]

Because there is no such thing as the "correct" ratio. It will be different for different computers.
It will be highly dependent on aggregate DRAM size, L2/L3 cache size, memory latency,
memory bandwidth, etc.

The optimal thing to do for ANY computer is to measure how long step 2 takes relative
to step 1 and select the B2/B1 ratio so that GMP-ECM spends as much time in step 2
as it does in step 1, [b]FOR THAT COMPUTER[/b]. This ratio will be DIFFERENT
for different computers.

This must be done on a case-by-case basis.




[QUOTE]

I have done empirical studies with GMP-ECM and composites around 200 digits, and found that stage 2 time of 40 to 50% of stage 1 time is quite a bit more efficient than stage 2 time equal to stage 1 time.
[/QUOTE]

Oh really? Explain how you determined that your selection of parameters gave the largest possible
probability of success per unit time spent. How did you do this analysis?



[QUOTE]
If you agree this is possible, please stop telling us to choose B2 to make stage 2 time equal to stage 1 time.

[/QUOTE]

Read what I wrote above.

Oh, and as you ask me to stop telling you certain things, I can ask you to stop prattling
about optimal parameter selection. You clearly do not understand it.

R.D. Silverman 2015-04-29 13:00

[QUOTE=Madpoo;401191]

Times like this I wish I'd spent a little more time in my programming classes so I could just hack the best parts of the ECM code into one or the other. LOL[/QUOTE]

Wrong class. You should have spent more time in your MATH classes.

Madpoo 2015-04-29 19:36

[QUOTE=R.D. Silverman;401229]Wrong class. You should have spent more time in your MATH classes.[/QUOTE]

Nah, programming. Just in the sense that both gmp-ecm and prime95 have source code, so in theory it would be possible to take the relevant libraries from one or the other and fit it in.

Madpoo 2015-04-29 19:41

[QUOTE=Madpoo;401190]gmp-ecm with "-maxmem 250" performs horribly.[/QUOTE]

So horribly in fact, I just called quits on it after it had been running about a day. There's no real sense of the progress of gmp-ecm so I have no clue how far along it was, but it proved the point that you're really hobbling stage 2 by limiting how much RAM it can use. Don't do it. Feed that beast with yummy RAM chips.


[QUOTE=Madpoo;401190]Just for fun I tried doing a stage 2 with B1=29e8 and B2=425327623620922 which is what I have three threads doing right now (and using 37 GB per thread with gmp-ecm). I'm still waiting to see just how long stage 2 will take with that "-B2scale 4" option. Definitely longer than the 128 minutes from before but we'll see.[/QUOTE]

It ended up taking 4.55 hours. (reminder that stage 1 on Prime95 took 5.75 hours).

I won't wade into the perfect ratio "debate", but suffice to say that "-B2scale 4" with ample memory available did get it closer to "whatever", compared to the 128 minutes stage 2 took previously.

Dubslow 2015-04-29 19:54

[QUOTE=Madpoo;401267]Nah, programming. Just in the sense that both gmp-ecm and prime95 have source code, so in theory it would be possible to take the relevant libraries from one or the other and fit it in.[/QUOTE]

No you really do mean math. No amount of programming experience and knowledge can tell you which pieces fit where without understanding what the pieces are doing.

lorgix 2015-04-29 20:47

[QUOTE=Madpoo;401269]So horribly in fact, I just called quits on it after it had been running about a day. There's no real sense of the progress of gmp-ecm so I have no clue how far along it was, but it proved the point that you're really hobbling stage 2 by limiting how much RAM it can use. Don't do it. Feed that beast with yummy RAM chips.




It ended up taking 4.55 hours. (reminder that stage 1 on Prime95 took 5.75 hours).

I won't wade into the perfect ratio "debate", but suffice to say that "-B2scale 4" with ample memory available did get it closer to "whatever", compared to the 128 minutes stage 2 took previously.[/QUOTE]
You limited available RAM very strongly. More realistically people will be using a k in the area of 4~20 or so.
If you tell it to use something like half or a quarter of what it "wants" you wont see as dramatic results as when limiting to 250.

If I were you I'd try B2scale 3 next. According to the estimated time to find a factor (given by -v), that should be close to optimal. In practice that is. (Not sure why RDS seem to disagree)

Madpoo 2015-04-29 20:57

[QUOTE=Dubslow;401270]No you really do mean math. No amount of programming experience and knowledge can tell you which pieces fit where without understanding what the pieces are doing.[/QUOTE]

I'll step back a bit and say that I don't [I]think[/I] I'd have problems figuring out the math part. It's a little rusty (a lot rusty?) but nevertheless...

Sadly for me, my college years involved Pascal, and since it was an electrical degree, I did some work in assembly on specific devices. Beyond that, my knowledge of C/C++/C# are what I'd best call "anecdotal". I have a passing familiarity with it and that's about it. I've been known to fire up Visual Studio and debug some things or modify some things where necessary (and I'm sure the devs I work with shudder in fear), but that's about it.

Madpoo 2015-04-29 20:59

[QUOTE=lorgix;401276]You limited available RAM very strongly. More realistically people will be using a k in the area of 4~20 or so.
If you tell it to use something like half or a quarter of what it "wants" you wont see as dramatic results as when limiting to 250.

If I were you I'd try B2scale 3 next. According to the estimated time to find a factor (given by -v), that should be close to optimal. In practice that is. (Not sure why RDS seem to disagree)[/QUOTE]

The idea there was to kind of mimic the (lousy?) memory choice Prime95 seems to be making for stage 2, using a puny 207 MB.

Conclusion: it's harmful. :smile: But then you don't have to be a rocket surgeon to know that.

lorgix 2015-04-29 21:35

[QUOTE=Madpoo;401278]The idea there was to kind of mimic the (lousy?) memory choice Prime95 seems to be making for stage 2, using a puny 207 MB.

Conclusion: it's harmful. :smile: But then you don't have to be a rocket surgeon to know that.[/QUOTE]
The implementations are very different, as other people have commented.

About the RAM: I just wanted to make sure you wouldn't be afraid to limit RAM usage [I]a little[/I] in cases where that might be convenient.

(If you want to put all those GBs of RAM to good use while contributing to GIMPS you may want to look at P-1.)

Madpoo 2015-04-30 01:08

[QUOTE=lorgix;401280]The implementations are very different, as other people have commented.

About the RAM: I just wanted to make sure you wouldn't be afraid to limit RAM usage [I]a little[/I] in cases where that might be convenient.

(If you want to put all those GBs of RAM to good use while contributing to GIMPS you may want to look at P-1.)[/QUOTE]

Okay, well, let me put it this way, is there any reason GMP-ECM would use the slower stage 1 code? Or that Prime95 would use a slower stage 2 routine?

(I don't think I'd ever deliberately limit the memory to gmp-ecm, but if I did it wouldn't be as drastic as going from 18 GB to 250 MB) :smile:

VBCurtis 2015-04-30 02:49

[QUOTE=Madpoo;401290]Okay, well, let me put it this way, is there any reason GMP-ECM would use the slower stage 1 code? Or that Prime95 would use a slower stage 2 routine?

(I don't think I'd ever deliberately limit the memory to gmp-ecm, but if I did it wouldn't be as drastic as going from 18 GB to 250 MB) :smile:[/QUOTE]

Prime95 uses an entirely different stage 2 routine, one that takes substantially less memory. Your case on M1277 is unusual, in that the number is small enough for GMP-ECM stage 2 to be practical. Another thread recently explored how small "small" needs to be, with a guess that exponents higher than 30-40k are too large for GMP-ECM's memory requirements (of course, memory the size you have would extend this up a bit). Exponents 1277-40k are such a small range that the folks who chase ECM factors down here can choose to use both programs if need be. The other 99.9x% of users would have no use for the GMP-ECM stage 2.

A similar "rare case" argument is made for stage 1- Prime95's faster code works only for mersenne numbers and (I think?) Fermat numbers. GMP-ECM is general-use, so including the Prime95 stage 1 for just those cases is not helpful for most users. Of course, that's different from answering "what stops someone from combining the two?"; specifically, the FFT stage 1 into GMP-ECM should be possible, though well beyond my pay grade. I'm a button-pusher, not a programmer.

As for flags to try for improved efficiency: Instead of using B2scale, try -k 2. This will choose the next-larger B2 size such that stage 2 will be done in just two steps. This uses more memory, but is usually faster.

VBCurtis 2015-04-30 03:11

[QUOTE=R.D. Silverman;401228]Because there is no such thing as the "correct" ratio. It will be different for different computers.
It will be highly dependent on aggregate DRAM size, L2/L3 cache size, memory latency,
memory bandwidth, etc.

The optimal thing to do for ANY computer is to measure how long step 2 takes relative
to step 1 and select the B2/B1 ratio so that GMP-ECM spends as much time in step 2
as it does in step 1, [b]FOR THAT COMPUTER[/b]. This ratio will be DIFFERENT
for different computers.

This must be done on a case-by-case basis.
[/QUOTE]

I'm sorry, we're looking at different "ratio"s. I am not referring to the ratio of B2 to B1. I am referring to the ratio of time spent on B2 to time spent on B1. In your paper, B2 = 0.4K * B1 for that computer/implementation/memory/etc, where K is the relative speed of stage 2 to stage 1. So, if B2 = K * B1, then the machine would have spent as much time in stage 2 as it did in stage 1, since stage 2 is K times faster. Since you chose B2 = 0.4K * B1, it is spending 40% as long in stage 2 as it did in stage 1. (right?)

Yet, you insist that stage 2 time should always be equal to stage 1 time for best results. Your answer to the ratio I am looking for is "always 1, no matter what, on any computer,", even though your paper suggests 0.4 was the correct ratio of stage 2 time to stage 1 time for the machine and software you used in that research. This is the contradiction I am trying to figure out. Can you please explain my error in interpreting your paper?

My previous post tried to make the point that this ratio is subject to discovery by experiment, as you said you did in that paper to find the 0.4 that you used. I used the -v flag in GMP-ECM to find the number of curves for a desired t-level, and tried to minimize the time to complete the listed number of curves for said t-level. This is the suggested procedure in the GMP-ECM documentation to optimize parameters for a particular computer/use case. I understand this to be standard practice, so I thought I was explaining my method when I made mention of the -v flag. I am interested to learn why this is an incorrect procedure, since that is what led me to conclude a ratio of stage 2 time ~ half of stage 1 time is the most efficient way to complete a desired t-level.

Madpoo 2015-04-30 03:51

[QUOTE=VBCurtis;401297]...well beyond my pay grade. I'm a button-pusher, not a programmer.[/QUOTE]

I'm with you on that. :smile:

[QUOTE=VBCurtis;401297]As for flags to try for improved efficiency: Instead of using B2scale, try -k 2. This will choose the next-larger B2 size such that stage 2 will be done in just two steps. This uses more memory, but is usually faster.[/QUOTE]

I'll give that a try next time, thanks.

VBCurtis 2015-04-30 04:21

A little data sample, for a C166 on a Haswell-E with DDR4:
B1 = 11e6, stage 1 took 46 sec for each run
B2 = 35e9, GMP-ECM default: 18 sec (this is k = 3 steps in stage 2)
t40 686 curves = 12.2 hr
t45 4482 curves = 79.7 hr
t50 33676 curves = 24.95 days

B2 = 240e9, 47 sec and equal to stage 1 time:
t40 503 curves = 13.0 hr
t45 3243 curves = 83.8 hr
t50 23946 curves = 25.78 days

So, for a common B1 choice of 11e6, the default GMP-ECM choice where stage 2 takes 40% the time of stage 1 has a lower expected time to find factors smaller than one would expect, of the size one targets with B1 = 11M (45 digits), and larger factors one might get lucky to find. This is one instance, but the data I took for a wide variety of B1/B2 shows in every case selecting B2 such that stage 2 time = stage 1 time has a longer expected time to find any size factor than stage 2 time = 40-50% of stage 1 time.

I also tested B2 = 96e9, 28 sec for the next-largest B2 that uses k = 2 steps in stage 2:
t40 582 curves = 12.0 hr
t45 3791 curves = 77.9 hr
t50 28048 curves = 24.0 days
These expected times are better than GMP-ECM default, showing the improved efficiency of choosing B2 such that stage 2 uses just 2 steps. This is why I suggest the -k 2 flag rather than B2scale in general.

lorgix 2015-04-30 11:47

[QUOTE=VBCurtis;401307]A little data sample, for a C166 on a Haswell-E with DDR4:
B1 = 11e6, stage 1 took 46 sec for each run
B2 = 35e9, GMP-ECM default: 18 sec (this is k = 3 steps in stage 2)
t40 686 curves = 12.2 hr
t45 4482 curves = 79.7 hr
t50 33676 curves = 24.95 days

B2 = 240e9, 47 sec and equal to stage 1 time:
t40 503 curves = 13.0 hr
t45 3243 curves = 83.8 hr
t50 23946 curves = 25.78 days

So, for a common B1 choice of 11e6, the default GMP-ECM choice where stage 2 takes 40% the time of stage 1 has a lower expected time to find factors smaller than one would expect, of the size one targets with B1 = 11M (45 digits), and larger factors one might get lucky to find. This is one instance, but the data I took for a wide variety of B1/B2 shows in every case selecting B2 such that stage 2 time = stage 1 time has a longer expected time to find any size factor than stage 2 time = 40-50% of stage 1 time.

I also tested B2 = 96e9, 28 sec for the next-largest B2 that uses k = 2 steps in stage 2:
t40 582 curves = 12.0 hr
t45 3791 curves = 77.9 hr
t50 28048 curves = 24.0 days
These expected times are better than GMP-ECM default, showing the improved efficiency of choosing B2 such that stage 2 uses just 2 steps. This is why I suggest the -k 2 flag rather than B2scale in general.[/QUOTE]
This is how I arrived at stage1time:stage2time::1:0.695.

About k: It is true that all else equal, a lower k is more efficient. But you will often need to use a different k to reach the optimal relationship between stage1time and stage2time. I mostly use B2scale, but I also pay attention to k and B2. Sometimes large steps occur, and I then test two (or more) different settings.

R.D. Silverman 2015-04-30 13:02

[QUOTE=VBCurtis;401307]A little data sample, for a C166 on a Haswell-E with DDR4:
B1 = 11e6, stage 1 took 46 sec for each run
B2 = 35e9, GMP-ECM default: 18 sec (this is k = 3 steps in stage 2)
t40 686 curves = 12.2 hr
t45 4482 curves = 79.7 hr
t50 33676 curves = 24.95 days

B2 = 240e9, 47 sec and equal to stage 1 time:
t40 503 curves = 13.0 hr
t45 3243 curves = 83.8 hr
t50 23946 curves = 25.78 days

So, for a common B1 choice of 11e6, the default GMP-ECM choice where stage 2 takes 40% the time of stage 1 has a lower expected time to find factors smaller than one would expect,
[/QUOTE]

Did you pull this conclusion from your ass? I see no, repeat no, zilch, nada computation
for the probability of finding a factor given your B1, B2 choices.

What is your OBJECTIVE FUNCTION???

Where are the probability computations? You can not conclude anything about
"lower expected time", unless you know what the probability of success is.

I see a lot of data about computation time. Nowhere do I see any kind of an
optimization. WHAT ARE YOU OPTIMIZING? It certainly is not the primary
objective, which is to maximize the probability of success per UNIT TIME spent.

lorgix 2015-04-30 13:10

[QUOTE=R.D. Silverman;401322]Did you pull this conclusion from your ass? I see no, repeat no, zilch, nada computation
for the probability of finding a factor given your B1, B2 choices.

What is your OBJECTIVE FUNCTION???

Where are the probability computations? You can not conclude anything about
"lower expected time", unless you know what the probability of success is.

I see a lot of data about computation time. Nowhere do I see any kind of an
optimization. WHAT ARE YOU OPTIMIZING? It certainly is not the primary
objective, which is to maximize the probability of success per UNIT TIME spent.[/QUOTE]
He is using the -v flag of GMP-ECM, and trusting the values it returns.

fivemack 2015-04-30 13:52

[QUOTE=R.D. Silverman;401322]Did you pull this conclusion from your ass? I see no, repeat no, zilch, nada computation
for the probability of finding a factor given your B1, B2 choices.[/quote]

Yes you do, that's the 't40 503 curves' part; gmp-ecm has implementations of all the Dickman-function work necessary for computing that probability, and reports its reciprocal there.

R.D. Silverman 2015-04-30 14:55

[QUOTE=lorgix;401325]He is using the -v flag of GMP-ECM, and trusting the values it returns.[/QUOTE]

This is a non-answer to the question I asked.

R.D. Silverman 2015-04-30 14:59

[QUOTE=fivemack;401331]Yes you do, that's the 't40 503 curves' part; gmp-ecm has implementations of all the Dickman-function work necessary for computing that probability, and reports its reciprocal there.[/QUOTE]

A choice of B1, B2 is optimal only for a particular factor SIZE. Dickman's function only returns
probabilities of smoothness for given B1,B2 with respect to a composite integer of a given size.
There is no such specification of size in the discussion. All that is presented is run-time info
for various B1, B2 choices.

There are still no probabilities given in the discussion, nor do I see any computations
involving probability/time.

R.D. Silverman 2015-04-30 15:21

[QUOTE=fivemack;401331]Yes you do, that's the 't40 503 curves' part; gmp-ecm has implementations of all the Dickman-function work necessary for computing that probability, and reports its reciprocal there.[/QUOTE]

We see, for example, a choice of B1, B2 for t40 and a time of 12.2 hrs.
We also see a choice for t40 where B2 is raised from 35e9 to 240e9 and a time of 13 hrs.

We do not see how the probability of success CHANGED when B2 went from 35e9 to 240e9.
Did it change more than or less than the relative time increase? (i.e. 13/12.2)

If the probability went up by more than a factor of 13/12.2, then it is clear that the choice
of 240e9 (i.e. equal time in step 1 and step 2) does give an improvement.

But nowhere in the discussion are the relative probabilities discussed!

bsquared 2015-04-30 15:38

[QUOTE=R.D. Silverman;401341]We see, for example, a choice of B1, B2 for t40 and a time of 12.2 hrs.
We also see a choice for t40 where B2 is raised from 35e9 to 240e9 and a time of 13 hrs.

We do not see how the probability of success CHANGED when B2 went from 35e9 to 240e9.
Did it change more than or less than the relative time increase? (i.e. 13/12.2)

If the probability went up by more than a factor of 13/12.2, then it is clear that the choice
of 240e9 (i.e. equal time in step 1 and step 2) does give an improvement.

But nowhere in the discussion are the relative probabilities discussed![/QUOTE]

The probablility stays the same - that is the definition of "t40". t40 means there is a 1/e chance of missing a factor of size 40 digits with the provided parameters. GMP-ECM computes all of the probabilities and is also aware of how long it takes itself to compute a curve. Running -v reports these details. When B2=35e9 it will take 12.2 hours to have a 1/e chance of missing a factor of size 40 digits. With B2=240e9 it takes 13 hours to have the same chance of missing a factor of size 40 digits. So the relative probabilities are being discussed.

R.D. Silverman 2015-04-30 16:43

[QUOTE=bsquared;401346]The probablility stays the same - that is the definition of "t40".

[/QUOTE]

Ah. That's the part that I was missing.

VBCurtis 2015-04-30 18:19

[QUOTE=lorgix;401325]He is using the -v flag of GMP-ECM, and trusting the values it returns.[/QUOTE]

This is not-quite-true; I average the times for a bunch of stage 1 runs, a couple of stage 2 runs for each choice of B2, and then multiply by the number of curves expected to be required to find a factor. This is a bit more accurate to get expected-time-to-find, but it's useful accuracy when trying to find 5% or small improvements. I do trust -v to provide expected curve counts, as listed in my sample data.

Doing such work on a spreadsheet also allows for quick calculation of expected time for a variety of B1/B2 values; invoking ECM with -v provides the expected curve counts immediately, so I can call ECM with a combination not already tried, note the curve counts required, and use the timings from previous stage 1 & stage 2 runs to find expected time to find a factor. Also, Stage 1 time is almost precisely linear in B1, so one can extrapolate for a variety of B1 values.

Now that Mr Silverman understands the definition of t40 and t45, how does the choice of B2=240e9 improve the probability of success PER UNIT TIME to find any size factor over B2=35e9, or 96e9?

R.D. Silverman 2015-04-30 18:46

[QUOTE=VBCurtis;401353]This is not-quite-true; I average the times for a bunch of stage 1 runs, a couple of stage 2 runs for each choice of B2, and then multiply by the number of curves expected to be required to find a factor. This is a bit more accurate to get expected-time-to-find, but it's useful accuracy when trying to find 5% or small improvements. I do trust -v to provide expected curve counts, as listed in my sample data.

Doing such work on a spreadsheet also allows for quick calculation of expected time for a variety of B1/B2 values; invoking ECM with -v provides the expected curve counts immediately, so I can call ECM with a combination not already tried, note the curve counts required, and use the timings from previous stage 1 & stage 2 runs to find expected time to find a factor. Also, Stage 1 time is almost precisely linear in B1, so one can extrapolate for a variety of B1 values.

Now that Mr Silverman understands the definition of t40 and t45, how does the choice of B2=240e9 improve the probability of success PER UNIT TIME to find any size factor over B2=35e9, or 96e9?[/QUOTE]

Clearly, for this implementation/hardware combination, it doesn't.

bsquared 2015-04-30 19:35

[QUOTE=R.D. Silverman;401355]Clearly, for this implementation/hardware combination, it doesn't.[/QUOTE]

Perhaps it's worth noting at this point that GMP-ECM implements features that violate assumptions made in the paper. So it should be no surprise that the results of the paper do not hold. For instance, stage 2 in GMP-ECM uses a fast polynomial arithmetic continuation. Therefore the time to compute to B2 is not a linear function of B1, as assumed in the paper. I.e., K increases with increasing (B2 - B1). GMP-ECM can also find factors by the Brent-Suyama extension, which is not treated in the paper.

ATH 2015-04-30 22:08

[QUOTE=VBCurtis;401307]These expected times are better than GMP-ECM default, showing the improved efficiency of choosing B2 such that stage 2 uses just 2 steps. This is why I suggest the -k 2 flag rather than B2scale in general.[/QUOTE]

-k 1 should be even better if you have enough RAM, it should be faster to do stage 2 in one chunk. For P-1 and P+1 it is always faster with -k 1 and I measured it on many occations, but with ECM it sometimes seems faster with -k 2 for some reason.

M1277 B1=110e6 B2=2e12:
k=1: Step 2 took 519171ms (Peak memory usage: 4400MB)
k=2: Step 2 took 720366ms (Peak memory usage: 4593MB)
So here -k 1 was faster but -k 2 took more RAM which makes no sense.

With P-1 and P+1 it never uses the exact B2 value you type in, but chooses the next larger that fits with parameters. For ECM it displays the exact value you typed in which makes me think it might do a stage 2 with a larger B2 value fitting parameters in the background, which could be why -k 2 is sometimes faster because it fits parameters better.


All times are UTC. The time now is 13:16.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.