![]() |
|
|
#144 |
|
May 2013
East. Always East.
11·157 Posts |
I don't have access to a hyperthreaded CPU, so maybe someone who has experience with that can help.
Has running a worker on a physical core and its logical friend as a helper been pretty much proven to be no faster than running on the single core? I know that trying to run 8 workers on 4 physical cores just results in more heat, but 4 workers on 8 logical cores, I don't remember. |
|
|
|
|
|
#145 | |
|
1976 Toyota Corona years forever!
"Wayne"
Nov 2006
Saskatchewan, Canada
22×7×167 Posts |
Quote:
|
|
|
|
|
|
|
#146 |
|
Dec 2002
32E16 Posts |
And as far as I understand it, the reason is that programs written in assembler are so optimized that there is no 'air' in them that could be squeezed out of it by trying to fit two threads of code through a single core. In contrast to compiled code.
|
|
|
|
|
|
#147 | |
|
"Curtis"
Feb 2005
Riverside, CA
4,861 Posts |
Quote:
I found best speeds by manually assigning cores so that ECM was never on its own core. |
|
|
|
|
|
|
#148 | |
|
May 2013
East. Always East.
110101111112 Posts |
Quote:
|
|
|
|
|
|
|
#149 |
|
Serpentine Vermin Jar
Jul 2014
7·11·43 Posts |
I went ahead and tried this exponent out on my dual core E5-2690v2 system (it's a DL380p Gen8)
It's running Windows Server 2012 R2 for what that's worth. Hyperthreading is enabled but I know that does pretty much nothing for P95. It does make cycles available for the OS though. Each CPU is 10 physical cores + HT, and there are 2 CPU's installed. At 10 threads it hits it's stride at 79 days and 4 hours. And it pretty much stays in that 79-80 day range all the way up through 20 threads, and then of course when it goes past 20 threads and uses the HT cores it also tends to stay in that same range. Sometimes showing a bit longer like up to 100 days, just based on how throttled the system seemed to be at the time. I thought I remembered doing some benchmarks on the same system a while back and seeing it scale up (slowly, but improving) all the way up through the 20 real cores even though it spanned CPU sockets. I guess that was on a smaller FFT size where the mem bandwidth between sockets wasn't a big limiting factor. Or I just misremembered. Either way it did enlighten me that when I'm knocking out some of these oddball exponents (those needing triple-checks, etc) I can do a lot better than I have been where I was just throwing all 20 cores at one exponent. I get the same performance with only 10, and I can do two of those. For these things I'm doing I want a single exponent tested as fast as possible because I'm collecting stats on triple-checks, or doing double-checks on "suspicious" results, so that works well for me. I know I could get more total throughput by splitting it up more. |
|
|
|
|
|
#150 |
|
"Oliver"
Mar 2005
Germany
11·101 Posts |
For (current) NUMA systems it is usually NOT a good idea to run a single worker across multiple CPU sockets (NUMA nodes to be more specific).
Oliver |
|
|
|
|
|
#151 |
|
Einyen
Dec 2003
Denmark
C5716 Posts |
I'm curious what is the timing for this 100M digit exponent on a Titan GPU?
|
|
|
|
|
|
#152 |
|
May 2013
East. Always East.
32778 Posts |
Yeah that's the idea of the benchmark. Look at total iter/sec throughput and see where the scaling stops. Use that to decide how many cores.
I'm not entirely surprised that 20 cores is no faster than 10 cores. |
|
|
|
|
|
#153 | |
|
Serpentine Vermin Jar
Jul 2014
7×11×43 Posts |
Quote:
Right now I'm having difficulty figuring out why Prime95 doesn't seem to correctly split cores apart so that if, for instance, I want one worker using 10 cores, and another using 10 cores, to split it so that the 10 cores in question are all on one CPU, and the other 10 are on the other CPU. When I tried that it seemed to flood one NUMA node with all 20 cores on that one (including the HT). I posed this question on another thread though so I won't belabor it here, but needless to say I'm trying to wrap my head around just how the AffinityScramble2 thing works. I'm probably psyching myself out thinking about it in terms of dual CPU rather than just multi cores on one CPU. Either way it's puzzling. |
|
|
|
|
|
|
#154 |
|
Sep 2010
So Cal
2×52 Posts |
Approximately 50 days.
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Perpetual benchmark thread... | Xyzzy | Hardware | 849 | 2021-05-20 12:38 |
| Sieve Benchmark Thread | Historian | Twin Prime Search | 105 | 2013-02-05 01:35 |
| LLR benchmark thread | Oddball | Riesel Prime Search | 5 | 2010-08-02 00:11 |
| sr5sieve Benchmark thread | axn | Sierpinski/Riesel Base 5 | 25 | 2010-05-28 23:57 |
| Old Hardware Thread | E_tron | Hardware | 0 | 2004-06-18 03:32 |