![]() |
![]() |
#1 |
"/X\(‘-‘)/X\"
Jan 2013
1100001000112 Posts |
![]()
I've been away for a while, but with winter here, it's time to burn some joules. In the past I mainly focused CPUs on DC, but I understand mprime v30.8 can efficiently use higher memory systems for P-1.
I have five 4 core Skylake systems with 32 GB of memory. I also plan to recombobulate four 4 core Haswell systems also with 32 GB of memory. I can allocate 30 GB or so on each system. They're all a little (but not severely) memory bandwidth constrained when doing LL/PRP. They all support AVX2/FMA. I don't have time to catch up on all the P-1 minutiae. What's the best way to configure these with regards to workers? |
![]() |
![]() |
![]() |
#2 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
11100110011112 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#3 | |
"Mihai Preda"
Apr 2015
5×172 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#4 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
163178 Posts |
![]()
Response is only logarithmic with ram amount. 24GiB is enough per George in prime95 / mprime.
Not all Skylake support 128GiB ram. Last fiddled with by kriesel on 2022-11-15 at 10:08 |
![]() |
![]() |
![]() |
#5 |
Dec 2016
2·32·7 Posts |
![]()
One key parameter is the exponents your are working on. Setup for 100k exponents is different than for 10M exponents, but you didn't mention what you plan to work on.
For stage 1, you want to ensure things fit into the cache. A 100k exponent fits nicely into a CPU core's L2 (or even L1) cache and it does not react very well to parallelization. So 1 core per worker is a good choice. Larger exponents like 10M will barely fit into the L3 cache. So it's a better idea to have just 1 worker, since multiple workers will just fight over the L3 cache (a.k.a. cache thrashing). For stage 2, I optimize my setup for RAM utilization, as mprime 30.8 is VERY memory hungry: Some workers are only doing stage 1 work (which uses almost no memory) while some workers only do stage 2 work. This way the stage 2 workers will constantly utilize the RAM, i.e. the RAM never goes unused. |
![]() |
![]() |
![]() |
#6 |
"/X\(‘-‘)/X\"
Jan 2013
1100001000112 Posts |
![]() |
![]() |
![]() |
![]() |
#7 | |
"/X\(‘-‘)/X\"
Jan 2013
13×239 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#8 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
1CCF16 Posts |
![]() Quote:
Try one worker and two workers, and use whatever gives better aggregate throughput. Four workers would either leave 3 stage 1 oversaturating one stage 2 at a time, or divide the memory into 2 stage two 15 GiB workers which is below George's threshold of 24 GiB. |
|
![]() |
![]() |
![]() |
#9 | |
"mrh"
Oct 2018
Temecula, ca
2·32·5 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#10 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
53×59 Posts |
![]()
I trust George to have done the sensible thing. A forum thread is a very quick way to get going. We've already churned through over 23m of exponent value in under 3 months. There's ~30M remaining, to catch up to the first test wavefront.
Edit: a quick start was useful to the project, because with the recent improvement in P-1 stage 2 efficiency and resultant factor productivity increase, some DC that was soon to be started was avoidable if we acted fast. Perhaps long term it would make sense to implement a special high memory and suitable software P-1 work type. I think new users and old not adjusting upward from the prime95 conservative memory limit default, and limited-memory systems, will be with us for a long time. George may choose to program other enhancements first, and is currently still working on the v30.9 ECM enhancement, optimizations for new chip designs, and doing occasional database maintenance or queries, etc. I think there is more to do with handling the disparate performance and economy cores of recent hardware. There is no NUMA-awareness in prime95/mprime. Mmff could use extension to higher bit capable mm127 kernels. There is no Google TPU compatible TF code for GIMPS. Unfortunately there is only one of him. Last fiddled with by kriesel on 2022-11-15 at 19:00 |
![]() |
![]() |
![]() |
#11 |
"mrh"
Oct 2018
Temecula, ca
10110102 Posts |
![]()
Oh, that makes sense, thanks.
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Configuration recommendations | gedelmann | Software | 4 | 2021-07-06 16:32 |
What's the best configuration of mprime for dual CPUs? | drkirkby | Software | 13 | 2021-04-18 18:28 |
Configuration on linux | seppe | Information & Answers | 3 | 2019-02-11 18:48 |
Optimal LL configuration | aurashift | Hardware | 11 | 2015-09-22 14:09 |
configuration for max memory bandwidth | smartypants | Hardware | 11 | 2015-07-26 09:16 |