20161226, 06:01  #727 
"/X\(‘‘)/X\"
Jan 2013
101101110001_{2} Posts 
I wanted to see how changing the CPU frequency would affect the power consumption of my i56600 DDR42133 compute cluster. I created a spreadsheet with timings of 4 threads and 4 workers.
We've noticed before Skylake gets more throughput using 4 threads instead of 4 workers, but it turns out that changes with CPU frequency. When I underclock the CPU to 3.3 GHz, using 4 workers begins to be more performant. Spreadsheet. Underclocking is proving to be an efficiency win. I haven't yet played with undervolting. I also intend to gather more data while running 3 threads/workers. So far 4 threads/workers at 3.3 GHz is faster than 3 threads at 3.7 GHz. I haven't yet disabled a core in the BIOS to check power consumption differences. I strongly suspect 4 cores will win. 
20161226, 09:07  #728 
Sep 2006
Brussels, Belgium
3×7×79 Posts 

20161226, 13:43  #729 
Jun 2003
13^{2}·29 Posts 
Based on the data, I think this statement is too strong. Even at 3.3 GHz, 1 worker is more performant that 4 workers, except for 3 FFTs where the difference is so small as to be negligible. Of course, the trend is there, so maybe 3.2 & below might show a clear superiority for the 4 workers.

20161226, 14:03  #730 
Oct 2007
Manchester, UK
17·79 Posts 
Of course, the real metric should be performance / total cost of ownership.
Add up the cost of the computer, as well as the cost of all previously used, and expected future use of electricity (based on chosen clock speed). Optionally also include the cost of your time working on setting up and configuring the machine. :) 
20161226, 14:28  #731 
Undefined
"The unspeakable one"
Jun 2006
My evil lair
2·3·1,019 Posts 
And if you lose one week of work trying to figure out the sweet spot and gain 3% then you have to work 24/7 for 32 weeks just to break even.

20161226, 19:30  #732  
"/X\(‘‘)/X\"
Jan 2013
2929_{10} Posts 
Quote:
Most impressive to me so far is that I can run at 3.3 GHz for 95% performance for 86% of running costs, before undervolting. Quote:
It would be hard to lose a week of work. I'm only experimenting on a single node, and when I am measuring the power draw I'm doing DC. Testing stability with undervolting will certainly take more time! 

20161226, 23:28  #733  
Oct 2007
Manchester, UK
17·79 Posts 
Quote:
How long do you plan on running these machines, and what do you project an extra ~10W / machine will cost over such a time? Edit: A rough calculation for 4 machines. 10W / machine is ~1kWh / day, and based on these figures electricity costs 11.16 c/kWh in Toronto now, but I'll round it up to 12 c/kWh. Over 4 years that comes to $175. Last fiddled with by lavalamp on 20161226 at 23:35 

20161227, 02:58  #734  
"/X\(‘‘)/X\"
Jan 2013
29×101 Posts 
Unfortunately I wasn't able to underclock my CPU. I didn't know the fixed multiplier also prevents underclocking. I have yet to try undervolting.
Quote:
Quote:
The cluster, not including the Ethernet switch, consumes 370 watts when running at stock clocks. That's 266 kWh/month, or $591/year. So if I can save 50 watts across the cluster, that's 36 kWh/month, or $80/year. I figure I'll run the cluster at most 2.5 years more. 

20170214, 08:34  #735 
"/X\(‘‘)/X\"
Jan 2013
29·101 Posts 
Single Rank versus Dual Rank DDR3
I now have two i34170 systems, both running 2x4 GB of DDR31600 RAM in dual channel configuration. Dual rank: [Work thread Feb 14 03:20] Timing 46 iterations of 1024K FFT length. Best time: 4.355 ms., avg time: 4.413 ms. [Work thread Feb 14 03:20] Timing 36 iterations of 1280K FFT length. Best time: 5.554 ms., avg time: 5.562 ms. [Work thread Feb 14 03:20] Timing 30 iterations of 1536K FFT length. Best time: 6.644 ms., avg time: 6.651 ms. [Work thread Feb 14 03:20] Timing 26 iterations of 1792K FFT length. Best time: 8.000 ms., avg time: 8.007 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 2048K FFT length. Best time: 9.072 ms., avg time: 9.084 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 2560K FFT length. Best time: 11.528 ms., avg time: 11.545 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 3072K FFT length. Best time: 13.875 ms., avg time: 13.886 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 3584K FFT length. Best time: 16.453 ms., avg time: 16.466 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 4096K FFT length. Best time: 18.976 ms., avg time: 18.991 ms. [Work thread Feb 14 03:20] Timing FFTs using 2 threads on 2 physical CPUs. [Work thread Feb 14 03:20] Timing 46 iterations of 1024K FFT length. Best time: 2.401 ms., avg time: 2.417 ms. [Work thread Feb 14 03:20] Timing 36 iterations of 1280K FFT length. Best time: 3.047 ms., avg time: 3.114 ms. [Work thread Feb 14 03:20] Timing 30 iterations of 1536K FFT length. Best time: 3.651 ms., avg time: 3.732 ms. [Work thread Feb 14 03:20] Timing 26 iterations of 1792K FFT length. Best time: 5.132 ms., avg time: 5.152 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 2048K FFT length. Best time: 5.079 ms., avg time: 5.102 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 2560K FFT length. Best time: 6.309 ms., avg time: 6.408 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 3072K FFT length. Best time: 7.651 ms., avg time: 7.779 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 3584K FFT length. Best time: 9.211 ms., avg time: 9.278 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 4096K FFT length. Best time: 10.455 ms., avg time: 10.505 ms. [Work thread Feb 14 03:20] Benchmarking multiple workers to measure the impact of memory bandwidth [Work thread Feb 14 03:20] Timing 1024K FFT, 2 cpus, 1 worker. Average times: 2.41 ms. Total throughput: 414.57 iter/sec. [Work thread Feb 14 03:21] Timing 1024K FFT, 2 cpus, 2 workers. Average times: 4.79, 4.77 ms. Total throughput: 418.37 iter/sec. [Work thread Feb 14 03:21] Timing 1280K FFT, 2 cpus, 1 worker. Average times: 3.10 ms. Total throughput: 323.10 iter/sec. [Work thread Feb 14 03:21] Timing 1280K FFT, 2 cpus, 2 workers. Average times: 6.29, 6.00 ms. Total throughput: 325.74 iter/sec. [Work thread Feb 14 03:22] Timing 1536K FFT, 2 cpus, 1 worker. Average times: 3.67 ms. Total throughput: 272.60 iter/sec. [Work thread Feb 14 03:22] Timing 1536K FFT, 2 cpus, 2 workers. Average times: 7.20, 7.20 ms. Total throughput: 277.67 iter/sec. [Work thread Feb 14 03:23] Timing 1792K FFT, 2 cpus, 1 worker. Average times: 4.43 ms. Total throughput: 225.57 iter/sec. [Work thread Feb 14 03:23] Timing 1792K FFT, 2 cpus, 2 workers. Average times: 9.24, 8.61 ms. Total throughput: 224.36 iter/sec. [Work thread Feb 14 03:23] Timing 2048K FFT, 2 cpus, 1 worker. Average times: 5.05 ms. Total throughput: 198.00 iter/sec. [Work thread Feb 14 03:23] Timing 2048K FFT, 2 cpus, 2 workers. Average times: 10.26, 9.79 ms. Total throughput: 199.62 iter/sec. [Work thread Feb 14 03:24] Timing 2560K FFT, 2 cpus, 1 worker. Average times: 6.31 ms. Total throughput: 158.43 iter/sec. [Work thread Feb 14 03:24] Timing 2560K FFT, 2 cpus, 2 workers. Average times: 12.51, 12.51 ms. Total throughput: 159.83 iter/sec. [Work thread Feb 14 03:25] Timing 3072K FFT, 2 cpus, 1 worker. Average times: 7.88 ms. Total throughput: 126.94 iter/sec. [Work thread Feb 14 03:25] Timing 3072K FFT, 2 cpus, 2 workers. Average times: 16.51, 15.39 ms. Total throughput: 125.51 iter/sec. [Work thread Feb 14 03:25] Timing 3584K FFT, 2 cpus, 1 worker. Average times: 9.20 ms. Total throughput: 108.75 iter/sec. [Work thread Feb 14 03:26] Timing 3584K FFT, 2 cpus, 2 workers. Average times: 18.26, 17.89 ms. Total throughput: 110.63 iter/sec. [Work thread Feb 14 03:26] Timing 4096K FFT, 2 cpus, 1 worker. Average times: 10.55 ms. Total throughput: 94.78 iter/sec. [Work thread Feb 14 03:26] Timing 4096K FFT, 2 cpus, 2 workers. Average times: 20.70, 20.70 ms. Total throughput: 96.61 iter/sec. Single rank: [Work thread Feb 14 03:20] Timing 46 iterations of 1024K FFT length. Best time: 4.549 ms., avg time: 4.560 ms. [Work thread Feb 14 03:20] Timing 36 iterations of 1280K FFT length. Best time: 5.789 ms., avg time: 5.800 ms. [Work thread Feb 14 03:20] Timing 30 iterations of 1536K FFT length. Best time: 6.920 ms., avg time: 6.930 ms. [Work thread Feb 14 03:20] Timing 26 iterations of 1792K FFT length. Best time: 8.945 ms., avg time: 8.963 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 2048K FFT length. Best time: 9.365 ms., avg time: 9.378 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 2560K FFT length. Best time: 11.980 ms., avg time: 11.988 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 3072K FFT length. Best time: 14.513 ms., avg time: 14.525 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 3584K FFT length. Best time: 17.111 ms., avg time: 17.156 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 4096K FFT length. Best time: 19.565 ms., avg time: 19.578 ms. [Work thread Feb 14 03:20] Timing FFTs using 2 threads on 2 physical CPUs. [Work thread Feb 14 03:20] Timing 46 iterations of 1024K FFT length. Best time: 2.751 ms., avg time: 2.763 ms. [Work thread Feb 14 03:20] Timing 36 iterations of 1280K FFT length. Best time: 3.496 ms., avg time: 3.513 ms. [Work thread Feb 14 03:20] Timing 30 iterations of 1536K FFT length. Best time: 4.050 ms., avg time: 4.095 ms. [Work thread Feb 14 03:20] Timing 26 iterations of 1792K FFT length. Best time: 4.952 ms., avg time: 4.993 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 2048K FFT length. Best time: 5.687 ms., avg time: 5.714 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 2560K FFT length. Best time: 7.372 ms., avg time: 7.387 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 3072K FFT length. Best time: 8.716 ms., avg time: 8.803 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 3584K FFT length. Best time: 10.557 ms., avg time: 10.651 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 4096K FFT length. Best time: 11.389 ms., avg time: 11.480 ms. [Work thread Feb 14 03:20] Benchmarking multiple workers to measure the impact of memory bandwidth [Work thread Feb 14 03:20] Timing 1024K FFT, 2 cpus, 1 worker. Average times: 2.61 ms. Total throughput: 383.01 iter/sec. [Work thread Feb 14 03:21] Timing 1024K FFT, 2 cpus, 2 workers. Average times: 5.25, 5.23 ms. Total throughput: 381.51 iter/sec. [Work thread Feb 14 03:21] Timing 1280K FFT, 2 cpus, 1 worker. Average times: 3.34 ms. Total throughput: 299.68 iter/sec. [Work thread Feb 14 03:21] Timing 1280K FFT, 2 cpus, 2 workers. Average times: 6.59, 6.56 ms. Total throughput: 304.07 iter/sec. [Work thread Feb 14 03:22] Timing 1536K FFT, 2 cpus, 1 worker. Average times: 3.96 ms. Total throughput: 252.35 iter/sec. [Work thread Feb 14 03:22] Timing 1536K FFT, 2 cpus, 2 workers. Average times: 7.90, 7.89 ms. Total throughput: 253.21 iter/sec. [Work thread Feb 14 03:22] Timing 1792K FFT, 2 cpus, 1 worker. Average times: 4.84 ms. Total throughput: 206.76 iter/sec. [Work thread Feb 14 03:23] Timing 1792K FFT, 2 cpus, 2 workers. Average times: 9.61, 9.61 ms. Total throughput: 208.08 iter/sec. [Work thread Feb 14 03:23] Timing 2048K FFT, 2 cpus, 1 worker. Average times: 5.52 ms. Total throughput: 181.32 iter/sec. [Work thread Feb 14 03:23] Timing 2048K FFT, 2 cpus, 2 workers. Average times: 10.87, 10.87 ms. Total throughput: 183.92 iter/sec. [Work thread Feb 14 03:24] Timing 2560K FFT, 2 cpus, 1 worker. Average times: 6.99 ms. Total throughput: 143.16 iter/sec. [Work thread Feb 14 03:24] Timing 2560K FFT, 2 cpus, 2 workers. Average times: 13.98, 13.89 ms. Total throughput: 143.52 iter/sec. [Work thread Feb 14 03:25] Timing 3072K FFT, 2 cpus, 1 worker. Average times: 8.47 ms. Total throughput: 118.00 iter/sec. [Work thread Feb 14 03:25] Timing 3072K FFT, 2 cpus, 2 workers. Average times: 16.75, 16.62 ms. Total throughput: 119.87 iter/sec. [Work thread Feb 14 03:25] Timing 3584K FFT, 2 cpus, 1 worker. Average times: 10.03 ms. Total throughput: 99.69 iter/sec. [Work thread Feb 14 03:26] Timing 3584K FFT, 2 cpus, 2 workers. Average times: 20.07, 19.75 ms. Total throughput: 100.46 iter/sec. [Work thread Feb 14 03:26] Timing 4096K FFT, 2 cpus, 1 worker. Average times: 11.44 ms. Total throughput: 87.38 iter/sec. [Work thread Feb 14 03:26] Timing 4096K FFT, 2 cpus, 2 workers. Average times: 22.67, 22.57 ms. Total throughput: 88.42 iter/sec. TL;DR: get dual rank memory. 
20170214, 17:15  #736  
"Kieren"
Jul 2011
In My Own Galaxy!
2·3·1,693 Posts 
Quote:


20170308, 22:29  #737 
"Victor de Hollander"
Aug 2011
the Netherlands
1176_{10} Posts 
Just for fun I ran the Prime95 benchmark on my old 11'' netbook with AMD E350 @1.6GHz
Code:
AMD E350 Processor CPU speed: 1596.06 MHz, 2 cores CPU features: 3DNow! Prefetch, SSE, SSE2 L1 cache size: 32 KB L2 cache size: 512 KB L1 cache line size: 64 bytes L2 cache line size: 64 bytes L1 TLBS: 40 L2 TLBS: 512 Prime95 32bit version 28.10, RdtscTiming=1 Best time for 1024K FFT length: 99.916 ms., avg: 103.876 ms. Best time for 1280K FFT length: 142.989 ms., avg: 149.942 ms. Best time for 1536K FFT length: 167.401 ms., avg: 174.861 ms. Best time for 1792K FFT length: 196.051 ms., avg: 213.505 ms. Best time for 2048K FFT length: 222.136 ms., avg: 231.517 ms. Best time for 2560K FFT length: 275.761 ms., avg: 289.138 ms. Best time for 3072K FFT length: 347.029 ms., avg: 357.043 ms. Best time for 3584K FFT length: 403.094 ms., avg: 417.261 ms. Best time for 4096K FFT length: 458.478 ms., avg: 477.660 ms. Best time for 5120K FFT length: 645.214 ms., avg: 656.045 ms. Best time for 6144K FFT length: 778.602 ms., avg: 801.066 ms. Best time for 7168K FFT length: 936.351 ms., avg: 955.565 ms. Best time for 8192K FFT length: 1056.663 ms., avg: 1081.029 ms. Timing FFTs using 2 threads. Best time for 1024K FFT length: 50.489 ms., avg: 51.903 ms. Best time for 1280K FFT length: 71.734 ms., avg: 74.501 ms. Best time for 1536K FFT length: 83.726 ms., avg: 86.498 ms. Best time for 1792K FFT length: 98.136 ms., avg: 101.268 ms. Best time for 2048K FFT length: 111.295 ms., avg: 113.926 ms. Best time for 2560K FFT length: 137.782 ms., avg: 144.518 ms. Best time for 3072K FFT length: 172.050 ms., avg: 178.164 ms. Best time for 3584K FFT length: 203.060 ms., avg: 208.392 ms. Best time for 4096K FFT length: 230.258 ms., avg: 236.426 ms. Best time for 5120K FFT length: 319.971 ms., avg: 327.913 ms. Best time for 6144K FFT length: 397.567 ms., avg: 418.475 ms. Best time for 7168K FFT length: 477.554 ms., avg: 487.654 ms. Best time for 8192K FFT length: 524.522 ms., avg: 534.575 ms. Timings for 1024K FFT length (2 cpus, 2 workers): 104.66, 98.42 ms. Throughput: 19.72 iter/sec. Timings for 1280K FFT length (2 cpus, 2 workers): 149.56, 140.02 ms. Throughput: 13.83 iter/sec. Timings for 1536K FFT length (2 cpus, 2 workers): 176.08, 163.24 ms. Throughput: 11.81 iter/sec. Timings for 1792K FFT length (2 cpus, 2 workers): 198.76, 186.35 ms. Throughput: 10.40 iter/sec. Timings for 2048K FFT length (2 cpus, 2 workers): 223.87, 209.46 ms. Throughput: 9.24 iter/sec. Timings for 2560K FFT length (2 cpus, 2 workers): 288.10, 267.54 ms. Throughput: 7.21 iter/sec. Timings for 3072K FFT length (2 cpus, 2 workers): 356.97, 335.54 ms. Throughput: 5.78 iter/sec. Timings for 3584K FFT length (2 cpus, 2 workers): 429.52, 401.99 ms. Throughput: 4.82 iter/sec. Timings for 4096K FFT length (2 cpus, 2 workers): 497.72, 464.37 ms. Throughput: 4.16 iter/sec. Timings for 5120K FFT length (2 cpus, 2 workers): 695.86, 655.48 ms. Throughput: 2.96 iter/sec. Timings for 6144K FFT length (2 cpus, 2 workers): 792.26, 738.64 ms. Throughput: 2.62 iter/sec. Timings for 7168K FFT length (2 cpus, 2 workers): 986.58, 920.18 ms. Throughput: 2.10 iter/sec. Timings for 8192K FFT length (2 cpus, 2 workers): 1096.14, 1023.91 ms. Throughput: 1.89 iter/sec. 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Perpetual "interesting video" thread...  Xyzzy  Lounge  39  20210312 14:19 
LLR benchmark thread  Oddball  Riesel Prime Search  5  20100802 00:11 
Perpetual I'm pi**ed off thread  rogue  Soap Box  19  20091028 19:17 
Perpetual autostereogram thread...  Xyzzy  Lounge  10  20060928 00:36 
Perpetual ECM factoring challenge thread...  Xyzzy  Factoring  65  20050905 08:16 