mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Lone Mersenne Hunters

Reply
 
Thread Tools
Old 2015-04-27, 15:42   #45
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

5·677 Posts
Default

Quote:
Originally Posted by lycorn View Post
Doing Stage 1 with Prime95, at least for these very small exponents, is definitely the best shot as Prime95 is a lot faster than GMP-ECM for S1. Feeding GMP-ECM with a large number of P95 S1 curves and forgetting it for a while renders the overhead negligible. Well, sort of...
Over the weekend I had a few machines running Prime95 doing just stage 1 at b1=29e8 for M1277. I spit out 240 curves between the machines using P95 just for the stage 1.

Now I took those 240 results and between these same systems I can run 10 at a time without causing memory issues (I'm assuming 20 GB per instance). Each instance will process 24 of the results. In detail, it's 4 machines (2x6 core Xeons). 3 of the 4 had enough extra RAM to run 2 at once and one of them was only using 50GB out of 144 GB so I'm giving it 4 to run at once. Too bad that machine has slightly slower CPU's (2.53 GHz instead of 3.47 GHz...oh well).

I just slightly tweaked the process I put together for having gmp-ecm do everything... mostly I didn't want to click around a bunch to set affinity and priority, and I only had to modify it slightly to grab a particular results file.

Anyway, those are all going now. I can't remember what it was last time, but I think it was taking something like 90 minutes or so for it to run stage 2 on each result. I guess in 36 hours I should have all 240 curves finished with stage 2.

BTW, I just let gmp-ecm figure out B2 (it's using B2=105101237217912).
Madpoo is offline   Reply With Quote
Old 2015-04-27, 15:47   #46
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

5×677 Posts
Default

Quote:
Originally Posted by Madpoo View Post
Anyway, those are all going now. I can't remember what it was last time, but I think it was taking something like 90 minutes or so for it to run stage 2 on each result. I guess in 36 hours I should have all 240 curves finished with stage 2.
It occurred to me that my previous tests were just running one per machine. Doing 2 (or 4) on the same system may create memory contention well beyond what I saw before and slow down the individual threads. I guess I'll find out.

I guess I could try setting the affinity on multiple threads to different NUMA nodes, but I don't even know if that would help. When the process started up it didn't have an affinity so the Windows scheduler (which is NUMA aware) probably didn't know to try and give it memory on those channels?

One more reason the affinity setting *should* be baked into the EXE so it's defined at launch.
Madpoo is offline   Reply With Quote
Old 2015-04-27, 18:15   #47
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

5·677 Posts
Default

[QUOTE=Madpoo;401029]It occurred to me that my previous tests were just running one per machine. Doing 2 (or 4) on the same system may create memory contention well beyond what I saw before and slow down the individual threads. I guess I'll find out./QUOTE]

It seems like it might be just fine.

On the faster 3.47 GHz systems it's still doing about 94 minutes to do the stage 2 on each one, which is about the same as before when only one was running (this one has 2 going). I did manually change the affinity so each instance was on it's own NUMA node which may have been a good idea.

On the slower 2.53 GHz systems which I didn't benchmark previously it's doing each stage 2 in about 128 minutes.

2.53 GHz / 3.47 GHz = 0.73
94 minutes / 128 minutes = 0.73

So yeah, seems like it scales pretty evenly with CPU speed, everything else being equal (similar servers, similar mem speed, same Xeon class CPU, etc)
Madpoo is offline   Reply With Quote
Old 2015-04-28, 00:16   #48
lycorn
 
lycorn's Avatar
 
"GIMFS"
Sep 2002
Oeiras, Portugal

1,571 Posts
Default

Well, that´s what I call a powerhouse! Those 240 curves will end up counting as probably more than 1000 @ default/current bounds set in the Primenet server.

Have you measured the time each curve is taking on S1? According to some posts in this thread, the ratio S1:S2 should be ~ 1:0.7. So if S2 is taking 94 minutes and S1 more than 94/0.7=134 mins you could ftry larger B2 values until you attain that ratio.
lycorn is online now   Reply With Quote
Old 2015-04-28, 00:34   #49
lycorn
 
lycorn's Avatar
 
"GIMFS"
Sep 2002
Oeiras, Portugal

1,571 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
I am curious. How much faster is P95 than GMP-ECM for S1 for Mersenne/Wagstaff numbers?
If one turns on the fast modular reduction for 2^n-1 within GMP-ECM, I would think that it would
be very fast....

I agree that P95 would/should be faster for large exponents (e.g. exponents greater than say 10^5).
I´ve timed some P95 and GMP-ECM runs, and in general P95 S1 is faster than GMP-ECM´s. I still have just a few data points so the results aren´t yet that meaningful, I will try to find some time to run some more, and also comparing, for the same expo size, different B1 values. I will then post some more robust conclusions.
It´s already becoming apparent, though, that for smaller exponents the difference is not as big as for larger ones, in line with what you wrote in your post.
lycorn is online now   Reply With Quote
Old 2015-04-28, 03:30   #50
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

5×677 Posts
Default

Quote:
Originally Posted by lycorn View Post
Well, that´s what I call a powerhouse! Those 240 curves will end up counting as probably more than 1000 @ default/current bounds set in the Primenet server.

Have you measured the time each curve is taking on S1? According to some posts in this thread, the ratio S1:S2 should be ~ 1:0.7. So if S2 is taking 94 minutes and S1 more than 94/0.7=134 mins you could ftry larger B2 values until you attain that ratio.
Pppphhhbbth.

The server that was taking 128 minutes in stage 2 with gmp-ecm was taking 5 hours, 45 minutes in stage 1 with P95.

Sounds like you're saying that I should really be goosing up B2 until it takes around 4 hours in stage 2?
Madpoo is offline   Reply With Quote
Old 2015-04-28, 04:14   #51
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

33·11·19 Posts
Default

Quote:
Originally Posted by Madpoo View Post
Pppphhhbbth.

The server that was taking 128 minutes in stage 2 with gmp-ecm was taking 5 hours, 45 minutes in stage 1 with P95.

Sounds like you're saying that I should really be goosing up B2 until it takes around 4 hours in stage 2?
Using the -B2scale 4 flag when calling stage 2 GMP-ECM will multiply B2 by four, which will double memory requirement and double stage 2 time. If your serve can handle that memory load, that will be more efficient (though only 5% or so more efficient than your current default settings- not an "OMG must do!" issue).

Edit: It's less clear that increasing B2 without increasing memory (which increases the number of steps to finish stage 2, a parameter GMP-ECM calls "k") will prove more efficient. If you are memory-limited to this current stage 2 footprint, it may be that only a small increase in B2 is worthwhile. B2 increases in large steps, corresponding to a unit change in k-value. If your current test uses k=3, the next B2 would be 1/3rd bigger and k=4 for same memory footprint. GMP-ECM by default uses k values 2 through 6, followed by a doubling of memory and reset to k=2 for a bigger more efficient work-chunk. If you set maxmem={number too small for default k choice}, the program will stick to the smaller work-chunk-size, increasing k beyond 6. This is usually less efficient, but experimentation is required (depends on individual machine specs).

Last fiddled with by VBCurtis on 2015-04-28 at 04:23 Reason: added detail about k parameter
VBCurtis is offline   Reply With Quote
Old 2015-04-28, 06:08   #52
lorgix
 
lorgix's Avatar
 
Sep 2010
Scandinavia

3·5·41 Posts
Default

I'm pretty sure B2 should be increased, even with a memory constraint.
lorgix is offline   Reply With Quote
Old 2015-04-28, 10:44   #53
R.D. Silverman
 
R.D. Silverman's Avatar
 
"Bob Silverman"
Nov 2003
North of Boston

22×1,877 Posts
Default

Quote:
Originally Posted by Madpoo View Post
Pppphhhbbth.

The server that was taking 128 minutes in stage 2 with gmp-ecm was taking 5 hours, 45 minutes in stage 1 with P95.

Sounds like you're saying that I should really be goosing up B2 until it takes around 4 hours in stage 2?
If people would ever bother to READ my joint paper with Sam Wagstaff, they would learn that
optimal ECM performance is obtained when one spends the same amount of TIME in step 1
and step 2.
R.D. Silverman is offline   Reply With Quote
Old 2015-04-28, 12:09   #54
lycorn
 
lycorn's Avatar
 
"GIMFS"
Sep 2002
Oeiras, Portugal

1,571 Posts
Default

According to posts #16, 22 and 32 of this thread, written by someone that apparently has read your papers, the ratio is 1:0.7, hence my observation. I lack the math background to fully understand what´s involved, so I trusted what seemed to come from a reliable source. I´m obviously happy to be corrected from someone qualified in the subject as yourself.
lycorn is online now   Reply With Quote
Old 2015-04-28, 12:33   #55
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

19·181 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
I am curious. How much faster is P95 than GMP-ECM for S1 for Mersenne/Wagstaff numbers?
If one turns on the fast modular reduction for 2^n-1 within GMP-ECM, I would think that it would
be very fast....

I agree that P95 would/should be faster for large exponents (e.g. exponents greater than say 10^5).
Prime95 and GMPECM running on 1 core each, stage1 only:

Code:
M1277 B1=110M	Prime95:16 min	GMPECM: 27 min	

M1277 B1=44M	Prime95:6.5min	GMPECM:  9.5 min

M2137 B1=44M	Prime95:8.5min	GMPECM: 29 min

M10061 B1=44M	Prime95:34.5min	GMPECM: 277 min

Last fiddled with by ATH on 2015-04-28 at 12:33
ATH is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
GMP-ECM & Prime95 Stage 1 Files Gordon GMP-ECM 3 2016-01-08 12:44
Stage 1 with mprime/prime95, stage 2 with GMP-ECM D. B. Staple Factoring 2 2007-12-14 00:21
Need help to run stage 1 and stage 2 separately jasong GMP-ECM 9 2007-10-25 22:32
P4 Prescott - 31 Stage Pipeline ? Bad news for Prime95? Angular Hardware 18 2004-11-15 07:04
Stage 1 and stage 2 tests missing Matthias C. Noc PrimeNet 5 2004-08-25 15:42

All times are UTC. The time now is 16:02.


Wed Feb 8 16:02:43 UTC 2023 up 174 days, 13:31, 1 user, load averages: 1.22, 1.06, 0.97

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔