mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2020-10-03, 19:16   #12
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

27·32·7 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
We turned off the short term "turbo" thingie.
It may be obvious that this turbo "thingie" really messes up benchmarking. (It wasn't obvious to us!) You think you are getting a certain speed during the benchmark but it is not sustainable.

It would be neat to have a system where when nobody is using it interactively the TDP could lower to the most optimal throughput per watt. Intel's XTU already has a deal where you can tie various power/speed profiles to certain programs so you could set the system to say 25W overall and boost the TDP for games and stuff.

Personally, we are most interested in maximum throughput per watt.

Xyzzy is offline   Reply With Quote
Old 2020-10-03, 19:41   #13
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

47×101 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
Personally, we are most interested in maximum throughput per watt.
Likewise, but the "per watt" should be of the whole system, right? A wall-plug meter like a kill-a-watt shows how much power the machine is using, which you can compare to production. Dropping the CPU power 20% and getting 10% less compute may be a loss per-watt, since the rest of the system has nearly-fixed power draw.

Then again, if you're using a GPU for production, which computation engine to assign the overhead to gets complicated.
VBCurtis is online now   Reply With Quote
Old 2020-10-03, 21:38   #14
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

116618 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
Then again, if you're using a GPU for production, which computation engine to assign the overhead to gets complicated.
Not really, if the decision is cpu computing, or adding a gpu and doing both. Then only the marginal cost of the gpu and powering it matter to gpu cost effectiveness. It's the same logic as deep discounted airfares; the airline moves the last seat from airport to airport whether it's filled or empty, and is financially ahead even if only recovering part of the prorated cost, as long as the fare covers more than the incremental cost of filling the seat and hauling the additional weight. The cost of the crew is unchanged, and the cost of aerodynamic drag change is probably quite small.

Last fiddled with by kriesel on 2020-10-03 at 21:41
kriesel is online now   Reply With Quote
Old 2020-10-03, 22:29   #15
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

27·32·7 Posts
Default

It is true that we have this computer running 24×7 anyways, so the base power cost is kinda free.

We would have to factor in whole system power if we ran a bunch of other boxes.

Right now we only have the i9 for our daily driver and an i7 "NUC" for msieve work and file sharing. We set the i7 to run at 1GHz so it is extremely low power at ~10W for the whole system.
Xyzzy is offline   Reply With Quote
Old 2020-10-04, 02:55   #16
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

27×32×7 Posts
Default

We happen to have a new i5-10600K as well.

Since it has fewer cores and since it is much cheaper, we are willing to get a little silly with it. Nothing dangerous like changing voltages or stuff like that. Silly as in removing the power and current limiters. (The Intel limits are 125W PL1 and 182W PL2.)

With those removed, the CPU will crank out 127W doing a six core PRP test. It can actually just barely do that test with the limits in place. The power limiter will blink on and off occasionally. With a twelve thread torture test set to "small FFT" and all of the AVX options enabled the CPU will use 157W. (!)

Since no ordinary workload will ever approach that level of stress, we feel confident that the power and current limits could be eliminated for an everyday PC using that CPU, if the cooling is up to the task and if the VRMs and power stage are solid.

Our feeling after playing around with both CPUs is that the six core variant makes a lot of sense. At max draw it uses ~26W per core. If you tried to do 26W per core on a ten core processor that would be 260W which isn't trivial. Also you are going to hit the memory bandwidth limit with probably a four core part so more cores doesn't seem scale super well. (We wish there was a four core i3 "K" part available.)

Now the ten core processor might make a bit of sense if you undervolt it or run it in a reduced power setting. Then you could utilize the "burst capacity" for transient CPU loads. However, in real life use, we are unable to discern which CPU is installed so we will probably sell the i9 and keep the i5.

We will post some benchmarks for the i5 in a while.

Attached Thumbnails
Click image for larger version

Name:	torture-test.png
Views:	59
Size:	73.9 KB
ID:	23471  
Xyzzy is offline   Reply With Quote
Old 2020-10-04, 03:19   #17
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

27×32×7 Posts
Default

Code:
Timings for 6144K FFT length (1 core, 1 worker): 19.77 ms.  Throughput: 50.58 iter/sec.
Timings for 6144K FFT length (2 cores, 1 worker): 10.62 ms.  Throughput: 94.14 iter/sec.
Timings for 6144K FFT length (2 cores, 2 workers): 21.45, 21.17 ms.  Throughput: 93.86 iter/sec.
Timings for 6144K FFT length (3 cores, 1 worker):  7.93 ms.  Throughput: 126.17 iter/sec.
Timings for 6144K FFT length (3 cores, 2 workers): 24.92, 12.03 ms.  Throughput: 123.25 iter/sec.
Timings for 6144K FFT length (3 cores, 3 workers): 24.75, 24.37, 24.40 ms.  Throughput: 122.44 iter/sec.
Timings for 6144K FFT length (4 cores, 1 worker):  7.42 ms.  Throughput: 134.72 iter/sec.
Timings for 6144K FFT length (4 cores, 2 workers): 15.94, 15.78 ms.  Throughput: 126.11 iter/sec.
Timings for 6144K FFT length (4 cores, 3 workers): 32.56, 31.77, 15.85 ms.  Throughput: 125.30 iter/sec.
Timings for 6144K FFT length (4 cores, 4 workers): 33.19, 33.04, 32.08, 31.76 ms.  Throughput: 123.05 iter/sec.
Timings for 6144K FFT length (5 cores, 1 worker):  7.70 ms.  Throughput: 129.80 iter/sec.
Timings for 6144K FFT length (5 cores, 2 workers): 21.23, 13.16 ms.  Throughput: 123.06 iter/sec.
Timings for 6144K FFT length (5 cores, 3 workers): 42.13, 20.60, 20.28 ms.  Throughput: 121.58 iter/sec.
Timings for 6144K FFT length (5 cores, 4 workers): 41.71, 43.03, 42.17, 20.36 ms.  Throughput: 120.05 iter/sec.
Timings for 6144K FFT length (5 cores, 5 workers): 42.24, 42.00, 40.81, 40.18, 41.15 ms.  Throughput: 121.18 iter/sec.
Timings for 6144K FFT length (6 cores, 1 worker):  8.14 ms.  Throughput: 122.87 iter/sec.
Timings for 6144K FFT length (6 cores, 2 workers): 16.92, 16.63 ms.  Throughput: 119.25 iter/sec.
Timings for 6144K FFT length (6 cores, 3 workers): 25.61, 25.76, 25.08 ms.  Throughput: 117.73 iter/sec.
Timings for 6144K FFT length (6 cores, 4 workers): 54.54, 50.87, 25.39, 25.19 ms.  Throughput: 117.08 iter/sec.
Timings for 6144K FFT length (6 cores, 5 workers): 51.48, 52.24, 50.63, 52.32, 24.79 ms.  Throughput: 117.77 iter/sec.
Timings for 6144K FFT length (6 cores, 6 workers): 51.73, 52.46, 50.84, 50.08, 49.47, 50.72 ms.  Throughput: 117.96 iter/sec.
Code:
Timings for 560K FFT length (1 core, 1 worker):  1.57 ms.  Throughput: 636.72 iter/sec.
Timings for 560K FFT length (2 cores, 1 worker):  0.85 ms.  Throughput: 1177.30 iter/sec.
Timings for 560K FFT length (2 cores, 2 workers):  1.63,  1.61 ms.  Throughput: 1235.07 iter/sec.
Timings for 560K FFT length (3 cores, 1 worker):  0.59 ms.  Throughput: 1698.31 iter/sec.
Timings for 560K FFT length (3 cores, 2 workers):  1.64,  0.85 ms.  Throughput: 1779.37 iter/sec.
Timings for 560K FFT length (3 cores, 3 workers):  1.69,  1.67,  1.67 ms.  Throughput: 1790.08 iter/sec.
Timings for 560K FFT length (4 cores, 1 worker):  0.45 ms.  Throughput: 2209.24 iter/sec.
Timings for 560K FFT length (4 cores, 2 workers):  0.88,  0.86 ms.  Throughput: 2299.87 iter/sec.
Timings for 560K FFT length (4 cores, 3 workers):  1.73,  1.71,  0.87 ms.  Throughput: 2304.95 iter/sec.
Timings for 560K FFT length (4 cores, 4 workers):  2.21,  2.18,  2.18,  2.16 ms.  Throughput: 1834.03 iter/sec.
Timings for 560K FFT length (5 cores, 1 worker):  0.37 ms.  Throughput: 2668.52 iter/sec.
Timings for 560K FFT length (5 cores, 2 workers):  0.89,  0.60 ms.  Throughput: 2788.76 iter/sec.
Timings for 560K FFT length (5 cores, 3 workers):  1.93,  0.97,  0.94 ms.  Throughput: 2609.61 iter/sec.
Timings for 560K FFT length (5 cores, 4 workers):  2.44,  2.43,  2.42,  1.07 ms.  Throughput: 2175.21 iter/sec.
Timings for 560K FFT length (5 cores, 5 workers):  3.30,  3.03,  3.01,  3.00,  2.99 ms.  Throughput: 1632.28 iter/sec.
Timings for 560K FFT length (6 cores, 1 worker):  0.32 ms.  Throughput: 3083.32 iter/sec.
Timings for 560K FFT length (6 cores, 2 workers):  0.62,  0.61 ms.  Throughput: 3242.26 iter/sec.
Timings for 560K FFT length (6 cores, 3 workers):  1.28,  1.25,  1.22 ms.  Throughput: 2401.40 iter/sec.
Timings for 560K FFT length (6 cores, 4 workers):  3.24,  3.07,  1.31,  1.31 ms.  Throughput: 2163.49 iter/sec.
Timings for 560K FFT length (6 cores, 5 workers):  3.60,  3.45,  3.43,  3.42,  1.34 ms.  Throughput: 1899.57 iter/sec.
Timings for 560K FFT length (6 cores, 6 workers):  3.99,  3.98,  4.02,  3.96,  3.93,  4.00 ms.  Throughput: 1507.35 iter/sec.
Xyzzy is offline   Reply With Quote
Old 2020-10-04, 06:35   #18
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

47·101 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
We will post some benchmarks for the i5 in a while.

If you have time, I am interested to see how the msieve benchmark we set up last winter goes.
VBCurtis is online now   Reply With Quote
Old 2020-10-05, 03:01   #19
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

27×32×7 Posts
Default

As an experiment we tested to see how far a simple overclock on the i5 would go.

We ended up at 4.8GHz on all cores with an AVX load but the CPU was using 200W to achieve it!

Xyzzy is offline   Reply With Quote
Old 2020-10-05, 13:13   #20
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

27·32·7 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
With the 250W limiter a different limit kicks in at around 145W. It is called the "current/EDP" limit. We haven't messed around with changing that yet. It sounds kinda scary.
So if you remove the "current/EDP" limiter and run a 20 thread torture test with small FFTs and all of the AVX options turned on, the i9 hits 302 watts! Amazingly, we can run that sustained with our cooling setup, but the CPU goes up to 85C and the VRMs go up to 82C. We stopped after five minutes just to be safe, and we don't plan on doing this again!

Xyzzy is offline   Reply With Quote
Old 2020-10-05, 13:20   #21
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

806410 Posts
Default

Here are benchmarks for 560K and 6M FFTs with all limits removed:
Code:
Timings for 560K FFT length (1 core, 1 worker):  1.44 ms.  Throughput: 695.56 iter/sec.
Timings for 560K FFT length (2 cores, 1 worker):  0.79 ms.  Throughput: 1265.70 iter/sec.
Timings for 560K FFT length (3 cores, 1 worker):  0.56 ms.  Throughput: 1785.49 iter/sec.
Timings for 560K FFT length (4 cores, 1 worker):  0.43 ms.  Throughput: 2335.99 iter/sec.
Timings for 560K FFT length (5 cores, 1 worker):  0.35 ms.  Throughput: 2829.04 iter/sec.
Timings for 560K FFT length (6 cores, 1 worker):  0.31 ms.  Throughput: 3206.84 iter/sec.
Timings for 560K FFT length (7 cores, 1 worker):  0.28 ms.  Throughput: 3593.28 iter/sec.
Timings for 560K FFT length (8 cores, 1 worker):  0.25 ms.  Throughput: 3997.59 iter/sec.
Timings for 560K FFT length (9 cores, 1 worker):  0.27 ms.  Throughput: 3708.03 iter/sec.
Timings for 560K FFT length (10 cores, 1 worker):  0.27 ms.  Throughput: 3703.69 iter/sec.

Timings for 6144K FFT length (1 core, 1 worker): 18.25 ms.  Throughput: 54.79 iter/sec.
Timings for 6144K FFT length (2 cores, 1 worker):  9.74 ms.  Throughput: 102.63 iter/sec.
Timings for 6144K FFT length (3 cores, 1 worker):  7.48 ms.  Throughput: 133.72 iter/sec.
Timings for 6144K FFT length (4 cores, 1 worker):  6.84 ms.  Throughput: 146.18 iter/sec.
Timings for 6144K FFT length (5 cores, 1 worker):  6.87 ms.  Throughput: 145.50 iter/sec.
Timings for 6144K FFT length (6 cores, 1 worker):  7.04 ms.  Throughput: 141.98 iter/sec.
Timings for 6144K FFT length (7 cores, 1 worker):  7.27 ms.  Throughput: 137.47 iter/sec.
Timings for 6144K FFT length (8 cores, 1 worker):  7.50 ms.  Throughput: 133.37 iter/sec.
Timings for 6144K FFT length (9 cores, 1 worker):  7.65 ms.  Throughput: 130.72 iter/sec.
Timings for 6144K FFT length (10 cores, 1 worker):  7.84 ms.  Throughput: 127.56 iter/sec.
It looks like the memory bandwidth is saturated at 8 cores for the 560K FFT and at 4 cores for the 6M FFT.
Xyzzy is offline   Reply With Quote
Old 2020-10-05, 14:00   #22
axn
 
axn's Avatar
 
Jun 2003

493910 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
It looks like the memory bandwidth is saturated at 8 cores for the 560K FFT and at 4 cores for the 6M FFT.
Unlikely that 560K result has anything to do with memory. The i9 is a 20MB L3 part so the whole FFT should be running entirely from L3 and shouldn't be hitting RAM at all. Probably just seeing multi-threading overhead (particularly since it is such a small FFT size).

Upto 20/8 = 2.5M FFT (or slight smaller to account for other overheads) should run entirely within L3.
axn is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
2020 Prime95 observations, issues, and suggestions rainchill Software 44 2021-03-08 06:03
Observations of Wieferich primes and Wieferich-1 friendly club hansl Math 3 2020-09-02 10:40
Observations with MaxHighMemWorkers petrw1 PrimeNet 5 2011-04-20 15:56
GIMPS emotions and random observations stars10250 Lounge 6 2008-09-10 05:01

All times are UTC. The time now is 19:15.

Thu Apr 22 19:15:32 UTC 2021 up 14 days, 13:56, 0 users, load averages: 2.41, 2.26, 2.14

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.