![]() |
|
|
#1 |
|
Jan 2015
11·23 Posts |
hi.
so, it'd be helpful on these humongo servers (and in the future when this stuff trickles down to everyone else) to have a summary at the end of running the timing tests. Finding the pertinent information is doable but less fun when there's 2 sockets of 22 hyperthreaded cores each doing 1000 iterations (a high # of iterations is getting very important to get a more representative result, turbo boost is becoming more and more of a factor, the cpu im referencing in this thread has a TB of 1600MHZ from 2.1 to 3.7 GHz!) Useful items to add for each core count iteration: 1) minimum, maximum, mean, and biggest range for the ms timings 2) what # of cores gets you the best timing bang for the buck (This is gonna get pertinent when more channels of RAM come out...AMD epyc has eight channels :O. I'm pretty sure I'm reaching the performance limit of my RAM). (my general rule is that its always gonna be 2 threads, but I'd like to be validated on that). 3) I forgot what 3 was. I'm having a brain fart. maybe an edit to follow. 4) 4 isn't 3. More than 1000 iterations as an option please. 5) maybe some kind of option for a parallel multiplier to get the host under load. Turbo boost is only helpful to skew the results, every time you start a thread that clock frequency is gonna fluctuate wildly. If you're using the entire box for mprime the timings may not be representative once it gets to 100% utilization. Something like running the same timing test concurrently X times with Y threads. Last fiddled with by aurashift on 2017-11-24 at 18:23 |
|
|
|
|
|
#2 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
2·3,767 Posts |
I assume you are talking about Advanced/Time menu choice.
Try using the Options/Benchmark menu choice. The throughput benchmarks are what I use to decide the best configuration (on my admittedly small core count machines). Of special interest is you can use a comma separated list of workers/cores to benchmark to greatly reduce the less-than-useful combinations. |
|
|
|
|
|
#3 |
|
Jan 2015
25310 Posts |
maybe you're right i need to wrap my head around this better. FFT size is running in CPU cache right? I'm not even sure i know what questions to ask.
Last fiddled with by aurashift on 2017-11-24 at 23:16 |
|
|
|
|
|
#4 |
|
Jan 2015
11·23 Posts |
i guess i want to know if there's any unusual tweaks i can use to better utilize this https://ark.intel.com/products/12049...Cache-2_10-GHz
|
|
|
|
|
|
#5 | |
|
"/X\(‘-‘)/X\"
Jan 2013
2×5×293 Posts |
Quote:
Also, memory bandwidth will factor in. Going by my Skylake experience, 6 channels of DDR4-2666 will hit a bottleneck around 14 or 15 cores in that chip. Any gains over that will probably make nothing but heat. I would suggest benchmarking: 1 and 2 workers, 12, 14, 16 and 18 cores 3 workers, 12, 15, and 18 cores 12, 14, 16, and 18 workers, 1 core. If the 18 case is better than the 15/16 case, try 21/22 workers/cores, too. And be sure to say yes to benchmarking All Complex FFT. Some of the timings are quite a bit faster. New mprime will eventually figure out the fastest FFT, but it won't adjust the number of workers/cores to use, so you need to benchmark the FFTs to figure out the number of workers/cores to use. Last fiddled with by Mark Rose on 2017-11-25 at 00:10 |
|
|
|
|
|
|
#6 |
|
Aug 2002
2·29 Posts |
I've been using its 22-core sibling, the E5-2696v4 for a few months now. 22-cores in one worker can get down to about 1.39ms/it for 4M. I did not find any combination with multiple workers that yielded better throughput than a single worker on all cores. It usually runs at 2600 or 2700MHz with all cores active.
So a single 22c Skylake with 6 channels at 2666MT/s should be able to do a bit better. The fact that it's a 2S system will probably hurt somewhat. If your RAM is rated for a higher speed than 2666MT/s you won't be able to run it at a higher clock, but you should be to drop tCL, tRCD, tRP, and tRAS somewhat. |
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Timing for different B1 values? | CRGreathouse | GMP-ECM | 8 | 2018-05-12 05:57 |
| Timing for large candidate | carpetpool | Conjectures 'R Us | 6 | 2016-12-31 06:02 |
| Strange timing for GMP-ECM 6.2.3 | jyb | GMP-ECM | 5 | 2010-02-10 14:01 |
| Question about mprime behaviour (possibly feature request) | TheJudger | Software | 7 | 2005-11-24 16:42 |
| Timing Options | Kevin | Software | 3 | 2002-09-12 14:03 |