mersenneforum.org i9 observations
 Register FAQ Search Today's Posts Mark Forums Read

2020-10-03, 19:16   #12
Xyzzy

"Mike"
Aug 2002

5·7·227 Posts

Quote:
 Originally Posted by Xyzzy We turned off the short term "turbo" thingie.
It may be obvious that this turbo "thingie" really messes up benchmarking. (It wasn't obvious to us!) You think you are getting a certain speed during the benchmark but it is not sustainable.

It would be neat to have a system where when nobody is using it interactively the TDP could lower to the most optimal throughput per watt. Intel's XTU already has a deal where you can tie various power/speed profiles to certain programs so you could set the system to say 25W overall and boost the TDP for games and stuff.

Personally, we are most interested in maximum throughput per watt.

2020-10-03, 19:41   #13
VBCurtis

"Curtis"
Feb 2005
Riverside, CA

462210 Posts

Quote:
 Originally Posted by Xyzzy Personally, we are most interested in maximum throughput per watt.
Likewise, but the "per watt" should be of the whole system, right? A wall-plug meter like a kill-a-watt shows how much power the machine is using, which you can compare to production. Dropping the CPU power 20% and getting 10% less compute may be a loss per-watt, since the rest of the system has nearly-fixed power draw.

Then again, if you're using a GPU for production, which computation engine to assign the overhead to gets complicated.

2020-10-03, 21:38   #14
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

484810 Posts

Quote:
 Originally Posted by VBCurtis Then again, if you're using a GPU for production, which computation engine to assign the overhead to gets complicated.
Not really, if the decision is cpu computing, or adding a gpu and doing both. Then only the marginal cost of the gpu and powering it matter to gpu cost effectiveness. It's the same logic as deep discounted airfares; the airline moves the last seat from airport to airport whether it's filled or empty, and is financially ahead even if only recovering part of the prorated cost, as long as the fare covers more than the incremental cost of filling the seat and hauling the additional weight. The cost of the crew is unchanged, and the cost of aerodynamic drag change is probably quite small.

Last fiddled with by kriesel on 2020-10-03 at 21:41

 2020-10-03, 22:29 #15 Xyzzy     "Mike" Aug 2002 5·7·227 Posts It is true that we have this computer running 24×7 anyways, so the base power cost is kinda free. We would have to factor in whole system power if we ran a bunch of other boxes. Right now we only have the i9 for our daily driver and an i7 "NUC" for msieve work and file sharing. We set the i7 to run at 1GHz so it is extremely low power at ~10W for the whole system.
 2020-10-04, 02:55 #16 Xyzzy     "Mike" Aug 2002 5·7·227 Posts We happen to have a new i5-10600K as well. Since it has fewer cores and since it is much cheaper, we are willing to get a little silly with it. Nothing dangerous like changing voltages or stuff like that. Silly as in removing the power and current limiters. (The Intel limits are 125W PL1 and 182W PL2.) With those removed, the CPU will crank out 127W doing a six core PRP test. It can actually just barely do that test with the limits in place. The power limiter will blink on and off occasionally. With a twelve thread torture test set to "small FFT" and all of the AVX options enabled the CPU will use 157W. (!) Since no ordinary workload will ever approach that level of stress, we feel confident that the power and current limits could be eliminated for an everyday PC using that CPU, if the cooling is up to the task and if the VRMs and power stage are solid. Our feeling after playing around with both CPUs is that the six core variant makes a lot of sense. At max draw it uses ~26W per core. If you tried to do 26W per core on a ten core processor that would be 260W which isn't trivial. Also you are going to hit the memory bandwidth limit with probably a four core part so more cores doesn't seem scale super well. (We wish there was a four core i3 "K" part available.) Now the ten core processor might make a bit of sense if you undervolt it or run it in a reduced power setting. Then you could utilize the "burst capacity" for transient CPU loads. However, in real life use, we are unable to discern which CPU is installed so we will probably sell the i9 and keep the i5. We will post some benchmarks for the i5 in a while. Attached Thumbnails
 2020-10-04, 03:19 #17 Xyzzy     "Mike" Aug 2002 5·7·227 Posts Code: Timings for 6144K FFT length (1 core, 1 worker): 19.77 ms. Throughput: 50.58 iter/sec. Timings for 6144K FFT length (2 cores, 1 worker): 10.62 ms. Throughput: 94.14 iter/sec. Timings for 6144K FFT length (2 cores, 2 workers): 21.45, 21.17 ms. Throughput: 93.86 iter/sec. Timings for 6144K FFT length (3 cores, 1 worker): 7.93 ms. Throughput: 126.17 iter/sec. Timings for 6144K FFT length (3 cores, 2 workers): 24.92, 12.03 ms. Throughput: 123.25 iter/sec. Timings for 6144K FFT length (3 cores, 3 workers): 24.75, 24.37, 24.40 ms. Throughput: 122.44 iter/sec. Timings for 6144K FFT length (4 cores, 1 worker): 7.42 ms. Throughput: 134.72 iter/sec. Timings for 6144K FFT length (4 cores, 2 workers): 15.94, 15.78 ms. Throughput: 126.11 iter/sec. Timings for 6144K FFT length (4 cores, 3 workers): 32.56, 31.77, 15.85 ms. Throughput: 125.30 iter/sec. Timings for 6144K FFT length (4 cores, 4 workers): 33.19, 33.04, 32.08, 31.76 ms. Throughput: 123.05 iter/sec. Timings for 6144K FFT length (5 cores, 1 worker): 7.70 ms. Throughput: 129.80 iter/sec. Timings for 6144K FFT length (5 cores, 2 workers): 21.23, 13.16 ms. Throughput: 123.06 iter/sec. Timings for 6144K FFT length (5 cores, 3 workers): 42.13, 20.60, 20.28 ms. Throughput: 121.58 iter/sec. Timings for 6144K FFT length (5 cores, 4 workers): 41.71, 43.03, 42.17, 20.36 ms. Throughput: 120.05 iter/sec. Timings for 6144K FFT length (5 cores, 5 workers): 42.24, 42.00, 40.81, 40.18, 41.15 ms. Throughput: 121.18 iter/sec. Timings for 6144K FFT length (6 cores, 1 worker): 8.14 ms. Throughput: 122.87 iter/sec. Timings for 6144K FFT length (6 cores, 2 workers): 16.92, 16.63 ms. Throughput: 119.25 iter/sec. Timings for 6144K FFT length (6 cores, 3 workers): 25.61, 25.76, 25.08 ms. Throughput: 117.73 iter/sec. Timings for 6144K FFT length (6 cores, 4 workers): 54.54, 50.87, 25.39, 25.19 ms. Throughput: 117.08 iter/sec. Timings for 6144K FFT length (6 cores, 5 workers): 51.48, 52.24, 50.63, 52.32, 24.79 ms. Throughput: 117.77 iter/sec. Timings for 6144K FFT length (6 cores, 6 workers): 51.73, 52.46, 50.84, 50.08, 49.47, 50.72 ms. Throughput: 117.96 iter/sec. Code: Timings for 560K FFT length (1 core, 1 worker): 1.57 ms. Throughput: 636.72 iter/sec. Timings for 560K FFT length (2 cores, 1 worker): 0.85 ms. Throughput: 1177.30 iter/sec. Timings for 560K FFT length (2 cores, 2 workers): 1.63, 1.61 ms. Throughput: 1235.07 iter/sec. Timings for 560K FFT length (3 cores, 1 worker): 0.59 ms. Throughput: 1698.31 iter/sec. Timings for 560K FFT length (3 cores, 2 workers): 1.64, 0.85 ms. Throughput: 1779.37 iter/sec. Timings for 560K FFT length (3 cores, 3 workers): 1.69, 1.67, 1.67 ms. Throughput: 1790.08 iter/sec. Timings for 560K FFT length (4 cores, 1 worker): 0.45 ms. Throughput: 2209.24 iter/sec. Timings for 560K FFT length (4 cores, 2 workers): 0.88, 0.86 ms. Throughput: 2299.87 iter/sec. Timings for 560K FFT length (4 cores, 3 workers): 1.73, 1.71, 0.87 ms. Throughput: 2304.95 iter/sec. Timings for 560K FFT length (4 cores, 4 workers): 2.21, 2.18, 2.18, 2.16 ms. Throughput: 1834.03 iter/sec. Timings for 560K FFT length (5 cores, 1 worker): 0.37 ms. Throughput: 2668.52 iter/sec. Timings for 560K FFT length (5 cores, 2 workers): 0.89, 0.60 ms. Throughput: 2788.76 iter/sec. Timings for 560K FFT length (5 cores, 3 workers): 1.93, 0.97, 0.94 ms. Throughput: 2609.61 iter/sec. Timings for 560K FFT length (5 cores, 4 workers): 2.44, 2.43, 2.42, 1.07 ms. Throughput: 2175.21 iter/sec. Timings for 560K FFT length (5 cores, 5 workers): 3.30, 3.03, 3.01, 3.00, 2.99 ms. Throughput: 1632.28 iter/sec. Timings for 560K FFT length (6 cores, 1 worker): 0.32 ms. Throughput: 3083.32 iter/sec. Timings for 560K FFT length (6 cores, 2 workers): 0.62, 0.61 ms. Throughput: 3242.26 iter/sec. Timings for 560K FFT length (6 cores, 3 workers): 1.28, 1.25, 1.22 ms. Throughput: 2401.40 iter/sec. Timings for 560K FFT length (6 cores, 4 workers): 3.24, 3.07, 1.31, 1.31 ms. Throughput: 2163.49 iter/sec. Timings for 560K FFT length (6 cores, 5 workers): 3.60, 3.45, 3.43, 3.42, 1.34 ms. Throughput: 1899.57 iter/sec. Timings for 560K FFT length (6 cores, 6 workers): 3.99, 3.98, 4.02, 3.96, 3.93, 4.00 ms. Throughput: 1507.35 iter/sec.
2020-10-04, 06:35   #18
VBCurtis

"Curtis"
Feb 2005
Riverside, CA

2·2,311 Posts

Quote:
 Originally Posted by Xyzzy We will post some benchmarks for the i5 in a while.
If you have time, I am interested to see how the msieve benchmark we set up last winter goes.

 2020-10-05, 03:01 #19 Xyzzy     "Mike" Aug 2002 174118 Posts As an experiment we tested to see how far a simple overclock on the i5 would go. We ended up at 4.8GHz on all cores with an AVX load but the CPU was using 200W to achieve it!
2020-10-05, 13:13   #20
Xyzzy

"Mike"
Aug 2002

11111000010012 Posts

Quote:
 Originally Posted by Xyzzy With the 250W limiter a different limit kicks in at around 145W. It is called the "current/EDP" limit. We haven't messed around with changing that yet. It sounds kinda scary.
So if you remove the "current/EDP" limiter and run a 20 thread torture test with small FFTs and all of the AVX options turned on, the i9 hits 302 watts! Amazingly, we can run that sustained with our cooling setup, but the CPU goes up to 85C and the VRMs go up to 82C. We stopped after five minutes just to be safe, and we don't plan on doing this again!

 2020-10-05, 13:20 #21 Xyzzy     "Mike" Aug 2002 794510 Posts Here are benchmarks for 560K and 6M FFTs with all limits removed: Code: Timings for 560K FFT length (1 core, 1 worker): 1.44 ms. Throughput: 695.56 iter/sec. Timings for 560K FFT length (2 cores, 1 worker): 0.79 ms. Throughput: 1265.70 iter/sec. Timings for 560K FFT length (3 cores, 1 worker): 0.56 ms. Throughput: 1785.49 iter/sec. Timings for 560K FFT length (4 cores, 1 worker): 0.43 ms. Throughput: 2335.99 iter/sec. Timings for 560K FFT length (5 cores, 1 worker): 0.35 ms. Throughput: 2829.04 iter/sec. Timings for 560K FFT length (6 cores, 1 worker): 0.31 ms. Throughput: 3206.84 iter/sec. Timings for 560K FFT length (7 cores, 1 worker): 0.28 ms. Throughput: 3593.28 iter/sec. Timings for 560K FFT length (8 cores, 1 worker): 0.25 ms. Throughput: 3997.59 iter/sec. Timings for 560K FFT length (9 cores, 1 worker): 0.27 ms. Throughput: 3708.03 iter/sec. Timings for 560K FFT length (10 cores, 1 worker): 0.27 ms. Throughput: 3703.69 iter/sec. Timings for 6144K FFT length (1 core, 1 worker): 18.25 ms. Throughput: 54.79 iter/sec. Timings for 6144K FFT length (2 cores, 1 worker): 9.74 ms. Throughput: 102.63 iter/sec. Timings for 6144K FFT length (3 cores, 1 worker): 7.48 ms. Throughput: 133.72 iter/sec. Timings for 6144K FFT length (4 cores, 1 worker): 6.84 ms. Throughput: 146.18 iter/sec. Timings for 6144K FFT length (5 cores, 1 worker): 6.87 ms. Throughput: 145.50 iter/sec. Timings for 6144K FFT length (6 cores, 1 worker): 7.04 ms. Throughput: 141.98 iter/sec. Timings for 6144K FFT length (7 cores, 1 worker): 7.27 ms. Throughput: 137.47 iter/sec. Timings for 6144K FFT length (8 cores, 1 worker): 7.50 ms. Throughput: 133.37 iter/sec. Timings for 6144K FFT length (9 cores, 1 worker): 7.65 ms. Throughput: 130.72 iter/sec. Timings for 6144K FFT length (10 cores, 1 worker): 7.84 ms. Throughput: 127.56 iter/sec. It looks like the memory bandwidth is saturated at 8 cores for the 560K FFT and at 4 cores for the 6M FFT.
2020-10-05, 14:00   #22
axn

Jun 2003

2·2,423 Posts

Quote:
 Originally Posted by Xyzzy It looks like the memory bandwidth is saturated at 8 cores for the 560K FFT and at 4 cores for the 6M FFT.
Unlikely that 560K result has anything to do with memory. The i9 is a 20MB L3 part so the whole FFT should be running entirely from L3 and shouldn't be hitting RAM at all. Probably just seeing multi-threading overhead (particularly since it is such a small FFT size).

Upto 20/8 = 2.5M FFT (or slight smaller to account for other overheads) should run entirely within L3.

 Similar Threads Thread Thread Starter Forum Replies Last Post hansl Math 3 2020-09-02 10:40 rainchill Software 43 2020-05-06 22:19 petrw1 PrimeNet 5 2011-04-20 15:56 stars10250 Lounge 6 2008-09-10 05:01

All times are UTC. The time now is 17:01.

Sun Jan 24 17:01:56 UTC 2021 up 52 days, 13:13, 1 user, load averages: 4.48, 4.57, 4.48