Register FAQ Search Today's Posts Mark Forums Read

 2016-12-26, 06:01 #727 Mark Rose     "/X\(‘-‘)/X\" Jan 2013 1011011100012 Posts I wanted to see how changing the CPU frequency would affect the power consumption of my i5-6600 DDR4-2133 compute cluster. I created a spreadsheet with timings of 4 threads and 4 workers. We've noticed before Skylake gets more throughput using 4 threads instead of 4 workers, but it turns out that changes with CPU frequency. When I underclock the CPU to 3.3 GHz, using 4 workers begins to be more performant. Spreadsheet. Underclocking is proving to be an efficiency win. I haven't yet played with undervolting. I also intend to gather more data while running 3 threads/workers. So far 4 threads/workers at 3.3 GHz is faster than 3 threads at 3.7 GHz. I haven't yet disabled a core in the BIOS to check power consumption differences. I strongly suspect 4 cores will win.
2016-12-26, 09:07   #728
S485122

Sep 2006
Brussels, Belgium

3×7×79 Posts

Quote:
 Originally Posted by Mark Rose ... Underclocking is proving to be an efficiency win. I haven't yet played with undervolting. ...
This is why the low voltage versions of those CPU are interesting : the i5-6600T for instance.

Jacob

2016-12-26, 13:43   #729
axn

Jun 2003

132·29 Posts

Quote:
 Originally Posted by Mark Rose We've noticed before Skylake gets more throughput using 4 threads instead of 4 workers, but it turns out that changes with CPU frequency. When I underclock the CPU to 3.3 GHz, using 4 workers begins to be more performant.
Based on the data, I think this statement is too strong. Even at 3.3 GHz, 1 worker is more performant that 4 workers, except for 3 FFTs where the difference is so small as to be negligible. Of course, the trend is there, so maybe 3.2 & below might show a clear superiority for the 4 workers.

 2016-12-26, 14:03 #730 lavalamp     Oct 2007 Manchester, UK 17·79 Posts Of course, the real metric should be performance / total cost of ownership. Add up the cost of the computer, as well as the cost of all previously used, and expected future use of electricity (based on chosen clock speed). Optionally also include the cost of your time working on setting up and configuring the machine. :)
2016-12-26, 14:28   #731
retina
Undefined

"The unspeakable one"
Jun 2006
My evil lair

2·3·1,019 Posts

Quote:
 Originally Posted by lavalamp Add up the cost of the computer, as well as the cost of all previously used, and expected future use of electricity (based on chosen clock speed). Optionally also include the cost of your time working on setting up and configuring the machine. :)
And if you lose one week of work trying to figure out the sweet spot and gain 3% then you have to work 24/7 for 32 weeks just to break even.

2016-12-26, 19:30   #732
Mark Rose

"/X\(‘-‘)/X\"
Jan 2013

292910 Posts

Quote:
 Originally Posted by axn Based on the data, I think this statement is too strong. Even at 3.3 GHz, 1 worker is more performant that 4 workers, except for 3 FFTs where the difference is so small as to be negligible. Of course, the trend is there, so maybe 3.2 & below might show a clear superiority for the 4 workers.
If you plot the slopes of iter/sec/ghz, the threads slope is steeper. I'll take more measurements and update the thread when done.

Most impressive to me so far is that I can run at 3.3 GHz for 95% performance for 86% of running costs, before undervolting.

Quote:
 Originally Posted by lavalamp Of course, the real metric should be performance / total cost of ownership. Add up the cost of the computer, as well as the cost of all previously used, and expected future use of electricity (based on chosen clock speed). Optionally also include the cost of your time working on setting up and configuring the machine. :)
It would have been much more cost efficient to go with the i5-6400, but I am hoping the i5-6600 will have better resale value in the future. Likewise I went with 32 GB of RAM. The additional $100/node also gives me more flexibility if I want to use the cluster for anything else. The four nodes share a single power supply. Quote:  Originally Posted by retina And if you lose one week of work trying to figure out the sweet spot and gain 3% then you have to work 24/7 for 32 weeks just to break even. It would be hard to lose a week of work. I'm only experimenting on a single node, and when I am measuring the power draw I'm doing DC. Testing stability with undervolting will certainly take more time! 2016-12-26, 23:28 #733 lavalamp Oct 2007 Manchester, UK 17·79 Posts Quote:  Originally Posted by Mark Rose It would have been much more cost efficient to go with the i5-6400, but I am hoping the i5-6600 will have better resale value in the future. Unless you plan on selling in the near-future, I would have thought the resale value on these would be essentially nil. Intel will release the 7xxx series early next year, and then (at least they claim) the 8xxx series later in the year, meaning you'll already be 2 generations behind. How long do you plan on running these machines, and what do you project an extra ~10W / machine will cost over such a time? Edit: A rough calculation for 4 machines. 10W / machine is ~1kWh / day, and based on these figures electricity costs 11.16 c/kWh in Toronto now, but I'll round it up to 12 c/kWh. Over 4 years that comes to$175.

Last fiddled with by lavalamp on 2016-12-26 at 23:35

2016-12-27, 02:58   #734
Mark Rose

"/X\(‘-‘)/X\"
Jan 2013

29×101 Posts

Unfortunately I wasn't able to underclock my CPU. I didn't know the fixed multiplier also prevents underclocking. I have yet to try undervolting.

Quote:
 Originally Posted by lavalamp Unless you plan on selling in the near-future, I would have thought the resale value on these would be essentially nil. Intel will release the 7xxx series early next year, and then (at least they claim) the 8xxx series later in the year, meaning you'll already be 2 generations behind.
There is a vibrant used component market here. I intend to sell the whole lot to a used system builder. Of greater concern with selling is what AMD Ryzen will do.

Quote:
 How long do you plan on running these machines, and what do you project an extra ~10W / machine will cost over such a time? Edit: A rough calculation for 4 machines. 10W / machine is ~1kWh / day, and based on these figures electricity costs 11.16 c/kWh in Toronto now, but I'll round it up to 12 c/kWh. Over 4 years that comes to $175. I wish electricity were so cheap. The actual electricity cost, once all the taxes, fees, etc., are factored in, is 17.7¢/kWh for the first 600 kWh and 19.5¢/kWh after. In the cooler months when not using AC, I've been using about 550 kWh with the cluster running. This past summer my highest monthly usage was about 1250 kWh. For simplicity I'll say the cluster incrementally costs me 18.5¢/kWh. The cluster, not including the Ethernet switch, consumes 370 watts when running at stock clocks. That's 266 kWh/month, or$591/year. So if I can save 50 watts across the cluster, that's 36 kWh/month, or \$80/year. I figure I'll run the cluster at most 2.5 years more.

2017-02-14, 17:15   #736

"Kieren"
Jul 2011
In My Own Galaxy!

2·3·1,693 Posts

Quote:
 TL;DR: get dual rank memory.
I wish it were easier to determine which parts are dual.

 2017-03-08, 22:29 #737 VictordeHolland     "Victor de Hollander" Aug 2011 the Netherlands 117610 Posts Just for fun I ran the Prime95 benchmark on my old 11'' netbook with AMD E-350 @1.6GHz Code: AMD E-350 Processor CPU speed: 1596.06 MHz, 2 cores CPU features: 3DNow! Prefetch, SSE, SSE2 L1 cache size: 32 KB L2 cache size: 512 KB L1 cache line size: 64 bytes L2 cache line size: 64 bytes L1 TLBS: 40 L2 TLBS: 512 Prime95 32-bit version 28.10, RdtscTiming=1 Best time for 1024K FFT length: 99.916 ms., avg: 103.876 ms. Best time for 1280K FFT length: 142.989 ms., avg: 149.942 ms. Best time for 1536K FFT length: 167.401 ms., avg: 174.861 ms. Best time for 1792K FFT length: 196.051 ms., avg: 213.505 ms. Best time for 2048K FFT length: 222.136 ms., avg: 231.517 ms. Best time for 2560K FFT length: 275.761 ms., avg: 289.138 ms. Best time for 3072K FFT length: 347.029 ms., avg: 357.043 ms. Best time for 3584K FFT length: 403.094 ms., avg: 417.261 ms. Best time for 4096K FFT length: 458.478 ms., avg: 477.660 ms. Best time for 5120K FFT length: 645.214 ms., avg: 656.045 ms. Best time for 6144K FFT length: 778.602 ms., avg: 801.066 ms. Best time for 7168K FFT length: 936.351 ms., avg: 955.565 ms. Best time for 8192K FFT length: 1056.663 ms., avg: 1081.029 ms. Timing FFTs using 2 threads. Best time for 1024K FFT length: 50.489 ms., avg: 51.903 ms. Best time for 1280K FFT length: 71.734 ms., avg: 74.501 ms. Best time for 1536K FFT length: 83.726 ms., avg: 86.498 ms. Best time for 1792K FFT length: 98.136 ms., avg: 101.268 ms. Best time for 2048K FFT length: 111.295 ms., avg: 113.926 ms. Best time for 2560K FFT length: 137.782 ms., avg: 144.518 ms. Best time for 3072K FFT length: 172.050 ms., avg: 178.164 ms. Best time for 3584K FFT length: 203.060 ms., avg: 208.392 ms. Best time for 4096K FFT length: 230.258 ms., avg: 236.426 ms. Best time for 5120K FFT length: 319.971 ms., avg: 327.913 ms. Best time for 6144K FFT length: 397.567 ms., avg: 418.475 ms. Best time for 7168K FFT length: 477.554 ms., avg: 487.654 ms. Best time for 8192K FFT length: 524.522 ms., avg: 534.575 ms. Timings for 1024K FFT length (2 cpus, 2 workers): 104.66, 98.42 ms. Throughput: 19.72 iter/sec. Timings for 1280K FFT length (2 cpus, 2 workers): 149.56, 140.02 ms. Throughput: 13.83 iter/sec. Timings for 1536K FFT length (2 cpus, 2 workers): 176.08, 163.24 ms. Throughput: 11.81 iter/sec. Timings for 1792K FFT length (2 cpus, 2 workers): 198.76, 186.35 ms. Throughput: 10.40 iter/sec. Timings for 2048K FFT length (2 cpus, 2 workers): 223.87, 209.46 ms. Throughput: 9.24 iter/sec. Timings for 2560K FFT length (2 cpus, 2 workers): 288.10, 267.54 ms. Throughput: 7.21 iter/sec. Timings for 3072K FFT length (2 cpus, 2 workers): 356.97, 335.54 ms. Throughput: 5.78 iter/sec. Timings for 3584K FFT length (2 cpus, 2 workers): 429.52, 401.99 ms. Throughput: 4.82 iter/sec. Timings for 4096K FFT length (2 cpus, 2 workers): 497.72, 464.37 ms. Throughput: 4.16 iter/sec. Timings for 5120K FFT length (2 cpus, 2 workers): 695.86, 655.48 ms. Throughput: 2.96 iter/sec. Timings for 6144K FFT length (2 cpus, 2 workers): 792.26, 738.64 ms. Throughput: 2.62 iter/sec. Timings for 7168K FFT length (2 cpus, 2 workers): 986.58, 920.18 ms. Throughput: 2.10 iter/sec. Timings for 8192K FFT length (2 cpus, 2 workers): 1096.14, 1023.91 ms. Throughput: 1.89 iter/sec.

 Similar Threads Thread Thread Starter Forum Replies Last Post Xyzzy Lounge 39 2021-03-12 14:19 Oddball Riesel Prime Search 5 2010-08-02 00:11 rogue Soap Box 19 2009-10-28 19:17 Xyzzy Lounge 10 2006-09-28 00:36 Xyzzy Factoring 65 2005-09-05 08:16

All times are UTC. The time now is 07:22.

Mon Apr 12 07:22:03 UTC 2021 up 4 days, 2:02, 1 user, load averages: 2.46, 2.53, 2.32