mersenneforum.org  

Go Back   mersenneforum.org > New To GIMPS? Start Here! > Information & Answers

Reply
 
Thread Tools
Old 2022-06-30, 00:59   #23
timbit
 
Mar 2009

22·5 Posts
Default

Here's 14 workers, 1 core each:


Prime95 64-bit version 30.7, RdtscTiming=1
FFTlen=48K, Type=3, Arch=4, Pass1=256, Pass2=192, clm=4 (14 cores, 14 workers): 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22 ms. Throughput: 63688.19 iter/sec.
FFTlen=48K, Type=3, Arch=4, Pass1=256, Pass2=192, clm=2 (14 cores, 14 workers): 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21 ms. Throughput: 66648.24 iter/sec.
FFTlen=48K, Type=3, Arch=4, Pass1=256, Pass2=192, clm=1 (14 cores, 14 workers): 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.22, 0.21, 0.21 ms. Throughput: 65654.27 iter/sec.
FFTlen=48K, Type=3, Arch=4, Pass1=768, Pass2=64, clm=4 (14 cores, 14 workers): 0.27, 0.27, 0.27, 0.27, 0.27, 0.27, 0.27, 0.27, 0.27, 0.27, 0.27, 0.27, 0.27, 0.27 ms. Throughput: 52101.51 iter/sec.
FFTlen=48K, Type=3, Arch=4, Pass1=768, Pass2=64, clm=2 (14 cores, 14 workers): 0.24, 0.23, 0.24, 0.23, 0.24, 0.23, 0.23, 0.24, 0.23, 0.24, 0.23, 0.24, 0.24, 0.24 ms. Throughput: 59580.44 iter/sec.
FFTlen=48K, Type=3, Arch=4, Pass1=768, Pass2=64, clm=1 (14 cores, 14 workers): 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.22, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23 ms. Throughput: 61841.43 iter/sec.


It appears mprime keeps selecting pass1=768, pass2=64, clm=2 for some reason. It's the 2nd slowest one.
timbit is offline   Reply With Quote
Old 2022-06-30, 01:31   #24
timbit
 
Mar 2009

248 Posts
Default

And here's 30.8b15:


Prime95 64-bit version 30.8, RdtscTiming=1
FFTlen=48K, Type=3, Arch=4, Pass1=256, Pass2=192, clm=4 (14 cores, 14 workers): 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22 ms. Throughput: 63942.02 iter/sec.
FFTlen=48K, Type=3, Arch=4, Pass1=256, Pass2=192, clm=2 (14 cores, 14 workers): 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21 ms. Throughput: 66820.75 iter/sec.
FFTlen=48K, Type=3, Arch=4, Pass1=256, Pass2=192, clm=1 (14 cores, 14 workers): 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21 ms. Throughput: 66012.58 iter/sec.
FFTlen=48K, Type=3, Arch=4, Pass1=768, Pass2=64, clm=4 (14 cores, 14 workers): 0.27, 0.27, 0.27, 0.27, 0.27, 0.27, 0.27, 0.27, 0.27, 0.27, 0.27, 0.27, 0.27, 0.27 ms. Throughput: 52146.49 iter/sec.
FFTlen=48K, Type=3, Arch=4, Pass1=768, Pass2=64, clm=2 (14 cores, 14 workers): 0.23, 0.23, 0.24, 0.23, 0.24, 0.23, 0.24, 0.23, 0.23, 0.23, 0.24, 0.24, 0.24, 0.24 ms. Throughput: 59571.93 iter/sec.
FFTlen=48K, Type=3, Arch=4, Pass1=768, Pass2=64, clm=1 (14 cores, 14 workers): 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.22, 0.23, 0.23, 0.23, 0.23, 0.23 ms. Throughput: 62050.87 iter/sec.


Also attached.
Attached Files
File Type: txt results.bench.txt (5.7 KB, 24 views)
timbit is offline   Reply With Quote
Old 2022-06-30, 03:28   #25
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

29×277 Posts
Default

Quote:
Originally Posted by timbit View Post
Here's 14 workers, 1 core each:
and what happens now if you run mprime with 14 workers?
Prime95 is offline   Reply With Quote
Old 2022-06-30, 16:45   #26
timbit
 
Mar 2009

2010 Posts
Default

Quote:
Originally Posted by Prime95 View Post
and what happens now if you run mprime with 14 workers?

Looks like all 14 workers are using Pass1=256, Pass2=192, clm=2. Please see attached prime_log_307.txt.


Ok, according to the results.bench.txt for 14 workers, 1 core each, this is indeed the fastest implementation.


But for other configurations, which may not be ideal (running 14 workers 1 core each, stage 2 ECM would take ~8GB each, and my system only has 32 GB total), is there anything else I can do?
Attached Files
File Type: txt prime_log_307.txt (7.6 KB, 21 views)

Last fiddled with by timbit on 2022-06-30 at 16:48 Reason: more comments...
timbit is offline   Reply With Quote
Old 2022-06-30, 18:22   #27
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

1ABC16 Posts
Default

Try 7 workers x 2 cores each.
If that alone is not satisfactory, try limiting memory used per worker. In undoc.txt:
Code:
The Memory=n setting in local.txt refers to the total amount of memory the
program can use.  You can also put this in the [Worker #n] section to place
a maximum amount of memory that one particular worker can use.

You can set MaxHighMemWorkers=n in local.txt.  This tells the program how
wany workers are allowed to use lots of memory.  This occurs doing stage 2
of P-1, P+1, or ECM on medium-to-large numbers.  Default is available memory / 1GB.

You can set a threshold for what is considered lots of memory in MaxHighMemWorkers
calculations.  In local.txt, set:
    HighMemThreshold=n        (default is 50)
The value n is in MB.
If you were to limit each worker to 6GB, that would not much reduce the efficiency of stage 2, and allow 4 stage2s running simultaneously. A fifth would get the remaining 4 of your 28, and be less than certain to occur. Over time some workers waiting for sufficient ram to run stage 2 would desynchronize the workers so that not all run stage 1 at the same time, but the set of workers at any given moment run a mix of stage 1 and 2. When one completes stage 2 its memory goes back into the pool to give out to another worker later.

Last fiddled with by kriesel on 2022-06-30 at 18:27
kriesel is online now   Reply With Quote
Old 2022-07-19, 19:18   #28
timbit
 
Mar 2009

2010 Posts
Default

I just noticed now that on the mersenne.org "Download software" page, it links to 30.8 b15.
On the main page, the latest announcement lists version 30.7 released, so I never bothered to check for latest stable software release. I guess I need to check the downloads page more often.
I'm still experimenting with the timings, but I have taken the advice and for small FFTs only using 1 or 2 threads max per worker. Throughput has increased, thanks.
timbit is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Running fstrim on SSD while mprime is running might cause errors in mprime AwesomeMachine Software 4 2021-10-07 23:49
Radeon VII on a mining-like bench Viliam Furik Viliam Furik 17 2021-01-14 08:12
mprime from git SELROC Software 2 2018-10-30 10:16
2 x AMD Opteron 2427 @ 2.39 GHz - prime95 bench- joblack Hardware 2 2010-03-12 19:38
Problem with mprime (Fixed with mprime -d) antiroach Software 2 2004-07-19 04:07

All times are UTC. The time now is 13:40.


Sat Oct 1 13:40:11 UTC 2022 up 44 days, 11:08, 1 user, load averages: 1.30, 1.31, 1.28

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔