20201003, 19:16  #12 
"Mike"
Aug 2002
2^{2}×13×157 Posts 
It may be obvious that this turbo "thingie" really messes up benchmarking. (It wasn't obvious to us!) You think you are getting a certain speed during the benchmark but it is not sustainable.
It would be neat to have a system where when nobody is using it interactively the TDP could lower to the most optimal throughput per watt. Intel's XTU already has a deal where you can tie various power/speed profiles to certain programs so you could set the system to say 25W overall and boost the TDP for games and stuff. Personally, we are most interested in maximum throughput per watt. 
20201003, 19:41  #13  
"Curtis"
Feb 2005
Riverside, CA
11253_{8} Posts 
Quote:
Then again, if you're using a GPU for production, which computation engine to assign the overhead to gets complicated. 

20201003, 21:38  #14 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5119_{10} Posts 
Not really, if the decision is cpu computing, or adding a gpu and doing both. Then only the marginal cost of the gpu and powering it matter to gpu cost effectiveness. It's the same logic as deep discounted airfares; the airline moves the last seat from airport to airport whether it's filled or empty, and is financially ahead even if only recovering part of the prorated cost, as long as the fare covers more than the incremental cost of filling the seat and hauling the additional weight. The cost of the crew is unchanged, and the cost of aerodynamic drag change is probably quite small.
Last fiddled with by kriesel on 20201003 at 21:41 
20201003, 22:29  #15 
"Mike"
Aug 2002
17744_{8} Posts 
It is true that we have this computer running 24×7 anyways, so the base power cost is kinda free.
We would have to factor in whole system power if we ran a bunch of other boxes. Right now we only have the i9 for our daily driver and an i7 "NUC" for msieve work and file sharing. We set the i7 to run at 1GHz so it is extremely low power at ~10W for the whole system. 
20201004, 02:55  #16 
"Mike"
Aug 2002
2^{2}×13×157 Posts 
We happen to have a new i510600K as well.
Since it has fewer cores and since it is much cheaper, we are willing to get a little silly with it. Nothing dangerous like changing voltages or stuff like that. Silly as in removing the power and current limiters. (The Intel limits are 125W PL1 and 182W PL2.) With those removed, the CPU will crank out 127W doing a six core PRP test. It can actually just barely do that test with the limits in place. The power limiter will blink on and off occasionally. With a twelve thread torture test set to "small FFT" and all of the AVX options enabled the CPU will use 157W. (!) Since no ordinary workload will ever approach that level of stress, we feel confident that the power and current limits could be eliminated for an everyday PC using that CPU, if the cooling is up to the task and if the VRMs and power stage are solid. Our feeling after playing around with both CPUs is that the six core variant makes a lot of sense. At max draw it uses ~26W per core. If you tried to do 26W per core on a ten core processor that would be 260W which isn't trivial. Also you are going to hit the memory bandwidth limit with probably a four core part so more cores doesn't seem scale super well. (We wish there was a four core i3 "K" part available.) Now the ten core processor might make a bit of sense if you undervolt it or run it in a reduced power setting. Then you could utilize the "burst capacity" for transient CPU loads. However, in real life use, we are unable to discern which CPU is installed so we will probably sell the i9 and keep the i5. We will post some benchmarks for the i5 in a while. 
20201004, 03:19  #17 
"Mike"
Aug 2002
2^{2}·13·157 Posts 
Code:
Timings for 6144K FFT length (1 core, 1 worker): 19.77 ms. Throughput: 50.58 iter/sec. Timings for 6144K FFT length (2 cores, 1 worker): 10.62 ms. Throughput: 94.14 iter/sec. Timings for 6144K FFT length (2 cores, 2 workers): 21.45, 21.17 ms. Throughput: 93.86 iter/sec. Timings for 6144K FFT length (3 cores, 1 worker): 7.93 ms. Throughput: 126.17 iter/sec. Timings for 6144K FFT length (3 cores, 2 workers): 24.92, 12.03 ms. Throughput: 123.25 iter/sec. Timings for 6144K FFT length (3 cores, 3 workers): 24.75, 24.37, 24.40 ms. Throughput: 122.44 iter/sec. Timings for 6144K FFT length (4 cores, 1 worker): 7.42 ms. Throughput: 134.72 iter/sec. Timings for 6144K FFT length (4 cores, 2 workers): 15.94, 15.78 ms. Throughput: 126.11 iter/sec. Timings for 6144K FFT length (4 cores, 3 workers): 32.56, 31.77, 15.85 ms. Throughput: 125.30 iter/sec. Timings for 6144K FFT length (4 cores, 4 workers): 33.19, 33.04, 32.08, 31.76 ms. Throughput: 123.05 iter/sec. Timings for 6144K FFT length (5 cores, 1 worker): 7.70 ms. Throughput: 129.80 iter/sec. Timings for 6144K FFT length (5 cores, 2 workers): 21.23, 13.16 ms. Throughput: 123.06 iter/sec. Timings for 6144K FFT length (5 cores, 3 workers): 42.13, 20.60, 20.28 ms. Throughput: 121.58 iter/sec. Timings for 6144K FFT length (5 cores, 4 workers): 41.71, 43.03, 42.17, 20.36 ms. Throughput: 120.05 iter/sec. Timings for 6144K FFT length (5 cores, 5 workers): 42.24, 42.00, 40.81, 40.18, 41.15 ms. Throughput: 121.18 iter/sec. Timings for 6144K FFT length (6 cores, 1 worker): 8.14 ms. Throughput: 122.87 iter/sec. Timings for 6144K FFT length (6 cores, 2 workers): 16.92, 16.63 ms. Throughput: 119.25 iter/sec. Timings for 6144K FFT length (6 cores, 3 workers): 25.61, 25.76, 25.08 ms. Throughput: 117.73 iter/sec. Timings for 6144K FFT length (6 cores, 4 workers): 54.54, 50.87, 25.39, 25.19 ms. Throughput: 117.08 iter/sec. Timings for 6144K FFT length (6 cores, 5 workers): 51.48, 52.24, 50.63, 52.32, 24.79 ms. Throughput: 117.77 iter/sec. Timings for 6144K FFT length (6 cores, 6 workers): 51.73, 52.46, 50.84, 50.08, 49.47, 50.72 ms. Throughput: 117.96 iter/sec. Code:
Timings for 560K FFT length (1 core, 1 worker): 1.57 ms. Throughput: 636.72 iter/sec. Timings for 560K FFT length (2 cores, 1 worker): 0.85 ms. Throughput: 1177.30 iter/sec. Timings for 560K FFT length (2 cores, 2 workers): 1.63, 1.61 ms. Throughput: 1235.07 iter/sec. Timings for 560K FFT length (3 cores, 1 worker): 0.59 ms. Throughput: 1698.31 iter/sec. Timings for 560K FFT length (3 cores, 2 workers): 1.64, 0.85 ms. Throughput: 1779.37 iter/sec. Timings for 560K FFT length (3 cores, 3 workers): 1.69, 1.67, 1.67 ms. Throughput: 1790.08 iter/sec. Timings for 560K FFT length (4 cores, 1 worker): 0.45 ms. Throughput: 2209.24 iter/sec. Timings for 560K FFT length (4 cores, 2 workers): 0.88, 0.86 ms. Throughput: 2299.87 iter/sec. Timings for 560K FFT length (4 cores, 3 workers): 1.73, 1.71, 0.87 ms. Throughput: 2304.95 iter/sec. Timings for 560K FFT length (4 cores, 4 workers): 2.21, 2.18, 2.18, 2.16 ms. Throughput: 1834.03 iter/sec. Timings for 560K FFT length (5 cores, 1 worker): 0.37 ms. Throughput: 2668.52 iter/sec. Timings for 560K FFT length (5 cores, 2 workers): 0.89, 0.60 ms. Throughput: 2788.76 iter/sec. Timings for 560K FFT length (5 cores, 3 workers): 1.93, 0.97, 0.94 ms. Throughput: 2609.61 iter/sec. Timings for 560K FFT length (5 cores, 4 workers): 2.44, 2.43, 2.42, 1.07 ms. Throughput: 2175.21 iter/sec. Timings for 560K FFT length (5 cores, 5 workers): 3.30, 3.03, 3.01, 3.00, 2.99 ms. Throughput: 1632.28 iter/sec. Timings for 560K FFT length (6 cores, 1 worker): 0.32 ms. Throughput: 3083.32 iter/sec. Timings for 560K FFT length (6 cores, 2 workers): 0.62, 0.61 ms. Throughput: 3242.26 iter/sec. Timings for 560K FFT length (6 cores, 3 workers): 1.28, 1.25, 1.22 ms. Throughput: 2401.40 iter/sec. Timings for 560K FFT length (6 cores, 4 workers): 3.24, 3.07, 1.31, 1.31 ms. Throughput: 2163.49 iter/sec. Timings for 560K FFT length (6 cores, 5 workers): 3.60, 3.45, 3.43, 3.42, 1.34 ms. Throughput: 1899.57 iter/sec. Timings for 560K FFT length (6 cores, 6 workers): 3.99, 3.98, 4.02, 3.96, 3.93, 4.00 ms. Throughput: 1507.35 iter/sec. 
20201004, 06:35  #18 
"Curtis"
Feb 2005
Riverside, CA
3^{4}×59 Posts 

20201005, 03:01  #19 
"Mike"
Aug 2002
17744_{8} Posts 
As an experiment we tested to see how far a simple overclock on the i5 would go.
We ended up at 4.8GHz on all cores with an AVX load but the CPU was using 200W to achieve it! 
20201005, 13:13  #20  
"Mike"
Aug 2002
2^{2}·13·157 Posts 
Quote:


20201005, 13:20  #21 
"Mike"
Aug 2002
2^{2}·13·157 Posts 
Here are benchmarks for 560K and 6M FFTs with all limits removed:
Code:
Timings for 560K FFT length (1 core, 1 worker): 1.44 ms. Throughput: 695.56 iter/sec. Timings for 560K FFT length (2 cores, 1 worker): 0.79 ms. Throughput: 1265.70 iter/sec. Timings for 560K FFT length (3 cores, 1 worker): 0.56 ms. Throughput: 1785.49 iter/sec. Timings for 560K FFT length (4 cores, 1 worker): 0.43 ms. Throughput: 2335.99 iter/sec. Timings for 560K FFT length (5 cores, 1 worker): 0.35 ms. Throughput: 2829.04 iter/sec. Timings for 560K FFT length (6 cores, 1 worker): 0.31 ms. Throughput: 3206.84 iter/sec. Timings for 560K FFT length (7 cores, 1 worker): 0.28 ms. Throughput: 3593.28 iter/sec. Timings for 560K FFT length (8 cores, 1 worker): 0.25 ms. Throughput: 3997.59 iter/sec. Timings for 560K FFT length (9 cores, 1 worker): 0.27 ms. Throughput: 3708.03 iter/sec. Timings for 560K FFT length (10 cores, 1 worker): 0.27 ms. Throughput: 3703.69 iter/sec. Timings for 6144K FFT length (1 core, 1 worker): 18.25 ms. Throughput: 54.79 iter/sec. Timings for 6144K FFT length (2 cores, 1 worker): 9.74 ms. Throughput: 102.63 iter/sec. Timings for 6144K FFT length (3 cores, 1 worker): 7.48 ms. Throughput: 133.72 iter/sec. Timings for 6144K FFT length (4 cores, 1 worker): 6.84 ms. Throughput: 146.18 iter/sec. Timings for 6144K FFT length (5 cores, 1 worker): 6.87 ms. Throughput: 145.50 iter/sec. Timings for 6144K FFT length (6 cores, 1 worker): 7.04 ms. Throughput: 141.98 iter/sec. Timings for 6144K FFT length (7 cores, 1 worker): 7.27 ms. Throughput: 137.47 iter/sec. Timings for 6144K FFT length (8 cores, 1 worker): 7.50 ms. Throughput: 133.37 iter/sec. Timings for 6144K FFT length (9 cores, 1 worker): 7.65 ms. Throughput: 130.72 iter/sec. Timings for 6144K FFT length (10 cores, 1 worker): 7.84 ms. Throughput: 127.56 iter/sec. 
20201005, 14:00  #22  
Jun 2003
2×3×827 Posts 
Quote:
Upto 20/8 = 2.5M FFT (or slight smaller to account for other overheads) should run entirely within L3. 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
2020 Prime95 observations, issues, and suggestions  rainchill  Software  53  20210504 13:07 
Observations of Wieferich primes and Wieferich1 friendly club  hansl  Math  3  20200902 10:40 
Observations with MaxHighMemWorkers  petrw1  PrimeNet  5  20110420 15:56 
GIMPS emotions and random observations  stars10250  Lounge  6  20080910 05:01 