mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Lone Mersenne Hunters

Reply
 
Thread Tools
Old 2015-04-28, 12:49   #56
lorgix
 
lorgix's Avatar
 
Sep 2010
Scandinavia

26716 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
If people would ever bother to READ my joint paper with Sam Wagstaff, they would learn that
optimal ECM performance is obtained when one spends the same amount of TIME in step 1
and step 2.
Quote:
Originally Posted by lycorn View Post
According to posts #16, 22 and 32 of this thread, written by someone that apparently has read your papers, the ratio is 1:0.7, hence my observation. I lack the math background to fully understand what´s involved, so I trusted what seemed to come from a reliable source. I´m obviously happy to be corrected from someone qualified in the subject as yourself.
I believe the paper arrives at the conclusion that stage2 should take 0.41 units of time when stage1+stage2 takes 1 unit of time.
Meaning stage1:stage2::0.59:0.41 or 1:0.7.
lorgix is offline   Reply With Quote
Old 2015-04-28, 14:51   #57
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

65168 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
If people would ever bother to READ my joint paper with Sam Wagstaff, they would learn that
optimal ECM performance is obtained when one spends the same amount of TIME in step 1
and step 2.
Bah, reading is for chumps. I like to reinvent wheels.

On the serious side, that is useful, as is all the discussion on optimizing gmp-ecm. In case it's not painfully obvious this is the first time I've used gmp-ecm for anything at all and I'll freely admit I skimmed the basics before diving in. Mostly because it's just a curiousity on my part, at this point anyway.
Madpoo is offline   Reply With Quote
Old 2015-04-28, 15:02   #58
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

2×2,927 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
If people would ever bother to READ my joint paper with Sam Wagstaff, they would learn that
optimal ECM performance is obtained when one spends the same amount of TIME in step 1
and step 2.
Your paper does not seem to say this, at all. On page 6, the paper claims that for a wide range of K proportions of stage 2 to stage 1 speed, choosing B2 = 0.4K*B1 is optimal. That choice would result in stage 2 taking 40% as long as stage 1. It uses K = 100 and B2= 41*B1 as a specific example. If stage 2 runs 100 times faster than stage 1, and we run stage 2 from B1 to 41*B1, we are spending 40% of the stage 1 time on stage 2, correct?

Since I am able to read, and have done so, please point me to where in your paper you contradict the conclusion I have summarized above. Also, note that empirical testing with current GMP-ECM versions support using 40% of stage 1 time for stage 2, with the flat-topped peak you mentioned in the paper (that is, it doesn't much matter if stage 2 is 35% or 45%).
VBCurtis is offline   Reply With Quote
Old 2015-04-28, 15:14   #59
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

2·13·131 Posts
Default

Quote:
Originally Posted by ATH View Post
Prime95 and GMPECM running on 1 core each, stage1 only:

Code:
M1277 B1=110M	Prime95:16 min	GMPECM: 27 min	

M1277 B1=44M	Prime95:6.5min	GMPECM:  9.5 min

M2137 B1=44M	Prime95:8.5min	GMPECM: 29 min

M10061 B1=44M	Prime95:34.5min	GMPECM: 277 min
Interesting.

I'm still convinced that it would be awesome if either:
  • Prime95 was updated so it's stage 2 ran better (could use more memory, code enhancements, whatever)
  • GMP-ECM was updated to run stage 1 faster

It's just a little unfortunate that we have a case where 2 programs both do one thing exceedingly awesome, but not the other, so if we want peak performance we're using both.

I'd still prefer to see Prime95 updated because a) it's easy to run multiple workers, b) results are easily checked in to Primenet, and c) easier to mix work types on one system with multiple cores

That said, I wouldn't be opposed to using gmp-ecm entirely if it were easier to a) set affinity (and priority should be idle by default, like Prime95), b) generate results that are compatible with Prime95 results that can be manually checked in

I don't mind that gmp-ecm (ecm.exe in particular) is command line... kind of makes it easier to manage multiple systems in that way I guess. So I'm thinking more in line of an "-affinity 0x001" type of thing where it's a basic affinity mask, or since it doesn't really multithread, make it even easier on the user by just "-affinity 4" would run on core #4, so the user doesn't have to figure out the mask. Kind of like the "Affinity=x" in Prime95 "local.txt".

The output of each curve could be boiled down to a compatible P95 results line like:
M1277 completed 1 ECM curve, B1=2900000000, B2=105101237217912

Without a checksum like Prime95 would generate, the server won't accept it, but I could imagine we'd be able to work something out so results from gmp-ecm could be checked in automatically... without emailing George. :)

Like, I guess right now when all 240 of my stage 2's are done, I'd let George know I did 240 curves with those bounds?
Madpoo is offline   Reply With Quote
Old 2015-04-28, 15:40   #60
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

2·13·131 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
Using the -B2scale 4 flag when calling stage 2 GMP-ECM will multiply B2 by four, which will double memory requirement and double stage 2 time. If your serve can handle that memory load, that will be more efficient (though only 5% or so more efficient than your current default settings- not an "OMG must do!" issue).

Edit: It's less clear that increasing B2 without increasing memory (which increases the number of steps to finish stage 2, a parameter GMP-ECM calls "k") will prove more efficient. If you are memory-limited to this current stage 2 footprint, it may be that only a small increase in B2 is worthwhile. B2 increases in large steps, corresponding to a unit change in k-value. If your current test uses k=3, the next B2 would be 1/3rd bigger and k=4 for same memory footprint. GMP-ECM by default uses k values 2 through 6, followed by a doubling of memory and reset to k=2 for a bigger more efficient work-chunk. If you set maxmem={number too small for default k choice}, the program will stick to the smaller work-chunk-size, increasing k beyond 6. This is usually less efficient, but experimentation is required (depends on individual machine specs).
I'm doing a timing run of stage 1 on M1277 with B1=2900000000, B2=425327623620922 (with the -B2scale 4 you mentioned). First I'm curious to do my own timing of stage 1 to see how it compared with Prime95...similar system as before with a 2.53 GHz core. Then I'm curious to see how the memory usage goes.

I have 3 of those running on a system right now. Stage 2 should use 34-40 GB per instance (this is a passive cluster node... lots of memory and nothing to do most of the time).

So this will tell me a few things:
1) The difference between stage 1 times on a system like that with P95 and gmp-ecm
2) About how much actual memory stage 2 will take (it said 34 GB but I'm guessing more like 38GB
3) What the stage 1/stage 2 ratio of times would be like purely within gmp-ecm

If the time difference on stage 1 isn't dramatically different than 5 hours, 45 minutes like Prime95 took, I may overlook that in favor of convenience. I could "set and forget" gmp-ecm to run 1000 curves over some period of weeks and forget about it until I check the output. Otherwise it would require regular care and feeding, and who has time for that?
Madpoo is offline   Reply With Quote
Old 2015-04-28, 16:53   #61
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

2·13·131 Posts
Default

I thought I'd poke my head into the Prime95 source, specifically the ECM routine.

I'm glancing through ecm.c and, first off, I am not a programmer so most of it I'm skimming.

I saw this in the stage 2 section:
Quote:
/* Process stage 2 in chunks if the bit array will be really large. */
/* By default, the bit array is limited to 250MB. Remember each byte */
/* corresponds to 8 odd numbers which is a range of 16. */
That might explain why stage 2 always seems to only use a limited amount of memory, unlike what P-1 is capable of doing.

The next bit of code has a reference to a config setting I've not seen before, "MaximumBitArraySize". Maxes out at 2000, so I guess you're only going to be able to use up to 2 GB. Is that an old 32-bit OS limitation? Any reason it couldn't be set higher?

The next section of code seems to alter the ending point value based on that bit array size:
Code:
if ((pm1data->C - pm1data->C_start) / 1000000 / 16 > max_bitarray_size)
pm1data->C = pm1data->C_start + max_bitarray_size * 1000000 * 16;
It would force stage 2 to run in smaller chunks... I guess like the "-k n" option in gmp-ecm.

This may sound dumb of me but could one reason Prime95 does stage 2 slower than gmp-ecm be due to it's low cap on memory? I may test this theory by comparing stage 2 runs on P95 with gmp-ecm and the "-maxmem 250" option. I'm also curious to try adding "MaximumBitArraySize=2000" to my prime.txt and see what happens. Would it use more RAM in stage 2? Who knows...stay tuned.

Are there other significant differences in the underlying code itself besides the amount of memory they'll use?
Madpoo is offline   Reply With Quote
Old 2015-04-28, 17:23   #62
axn
 
axn's Avatar
 
Jun 2003

23·683 Posts
Default

Quote:
Originally Posted by Madpoo View Post
Are there other significant differences in the underlying code itself besides the amount of memory they'll use?
Yes. GMP-ECM uses a O(sqrt(B2)) algorithm while P95 uses O(B2) algorithm for stage 2.
axn is offline   Reply With Quote
Old 2015-04-28, 21:51   #63
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

D4E16 Posts
Default

Quote:
Originally Posted by Madpoo View Post
...
The output of each curve could be boiled down to a compatible P95 results line like:
M1277 completed 1 ECM curve, B1=2900000000, B2=105101237217912

Without a checksum like Prime95 would generate, the server won't accept it, but I could imagine we'd be able to work something out so results from gmp-ecm could be checked in automatically... without emailing George. :)
Well, I'm not so sure now. I see results checked in from people that are just like that. Mine didn't work when I pasted it, although I did use "curve" instead of "curves". I got an error about a missing checksum. :)

Maybe only certain trusted users are allowed to add entries like that. Might make sense...wouldn't want just anyone pasting in whatever?

EDIT: Yes, that's it, I found out why my submissions fail but a certain other trustworthy user is also allowed to do so. If I continue crunching away at M1277 or any other ECM work, I could petition George to be included. I think he trusts me.

Last fiddled with by Madpoo on 2015-04-28 at 21:58
Madpoo is offline   Reply With Quote
Old 2015-04-28, 23:10   #64
R.D. Silverman
 
R.D. Silverman's Avatar
 
"Bob Silverman"
Nov 2003
North of Boston

1D8D16 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
Your paper does not seem to say this, at all. On page 6, the paper claims that for a wide range of K proportions of stage 2 to stage 1 speed, choosing B2 = 0.4K*B1 is optimal. .
The mathematical analysis showed that if B2 and B1 were chosen optimally, that
 \partial B2/\partial B1 = K where step 2 was K times as fast as step 1.


Thus, if B1 changes to B1 + x, then B2 needs to change to B2 + Kx to remain
at optimality. This in turn implies that step 2 should take as long as step 1 to
remain at optimality.
R.D. Silverman is offline   Reply With Quote
Old 2015-04-28, 23:17   #65
R.D. Silverman
 
R.D. Silverman's Avatar
 
"Bob Silverman"
Nov 2003
North of Boston

5·17·89 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
The mathematical analysis showed that if B2 and B1 were chosen optimally, that
 \partial B2/\partial B1 = K where step 2 was K times as fast as step 1.


Thus, if B1 changes to B1 + x, then B2 needs to change to B2 + Kx to remain
at optimality. This in turn implies that step 2 should take as long as step 1 to
remain at optimality.
Let me add that as the B2/B1 ratio changes for a particular implementation then step 2 does not REMAIN
K times as fast uniformly over the range [B1, B2]. The ".4K" was for a particular
implementation on a SUN-3 computer and determined by experiment. If the machine
had much much more memory, the ".4" coefficient would have become larger. If enough
memory is available so that step 2 need not be done in segments, and one uses a "perfect"
implementation of a convolution based Step 2, then one should always choose B2 so that
step 1 and step 2 take the same time.
R.D. Silverman is offline   Reply With Quote
Old 2015-04-29, 01:29   #66
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

2·2,927 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
Let me add that as the B2/B1 ratio changes for a particular implementation then step 2 does not REMAIN
K times as fast uniformly over the range [B1, B2]. The ".4K" was for a particular
implementation on a SUN-3 computer and determined by experiment. If the machine
had much much more memory, the ".4" coefficient would have become larger. If enough
memory is available so that step 2 need not be done in segments, and one uses a "perfect"
implementation of a convolution based Step 2, then one should always choose B2 so that
step 1 and step 2 take the same time.
If 0.4K was determined by experiment on that implementation, why wouldn't we determine experimentally the correct ratio for the current (non-perfect implementation) GMP-ECM? Why do you insist on badgering everyone that stage 2 time should be equal to stage 1 time, when your own paper determined stage 2 time = 40% of stage 1 time was optimal on that machine, and our present software is also generally not optimally efficient with stage 1 time = stage 2 time?

I mean, I appreciate that with perfect software and no memory constraint you are correct, but we do not have such software at our disposal. So, why would we choose your mathematically correct but in practice (with current software and memory systems) less efficient ratio?

I have done empirical studies with GMP-ECM and composites around 200 digits, and found that stage 2 time of 40 to 50% of stage 1 time is quite a bit more efficient than stage 2 time equal to stage 1 time. It appears your paper supports the possibility that my experiments were not in error, and also that your point can be correct in principle yet not for our software. I am trusting the number of curves needed for a (say) t50 GMP-ECM calculates with the "-v" flag for each variant of B1/B2, as I believe that is coded based on your paper.

If you agree this is possible, please stop telling us to choose B2 to make stage 2 time equal to stage 1 time. Every experiment I have run says this is bad advice with current software. If you disagree, please run your own experiments and post data.
VBCurtis is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
GMP-ECM & Prime95 Stage 1 Files Gordon GMP-ECM 3 2016-01-08 12:44
Stage 1 with mprime/prime95, stage 2 with GMP-ECM D. B. Staple Factoring 2 2007-12-14 00:21
Need help to run stage 1 and stage 2 separately jasong GMP-ECM 9 2007-10-25 22:32
P4 Prescott - 31 Stage Pipeline ? Bad news for Prime95? Angular Hardware 18 2004-11-15 07:04
Stage 1 and stage 2 tests missing Matthias C. Noc PrimeNet 5 2004-08-25 15:42

All times are UTC. The time now is 13:16.


Fri Jul 7 13:16:26 UTC 2023 up 323 days, 10:44, 0 users, load averages: 1.36, 1.23, 1.15

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔