![]() |
|
|
#726 |
|
Random Account
Aug 2009
111101010112 Posts |
Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz
CPU speed: 3557.21 MHz, 4 cores CPU features: 3DNow!, SSE, SSE2, SSE4, AVX L1 cache size: 32 KB L2 cache size: 256 KB, L3 cache size: 6 MB L1 cache line size: 64 bytes L2 cache line size: 64 bytes TLBS: 64 Prime95 64-bit version 28.10, RdtscTiming=1 Timing FFTs using 1 thread. Best time for 1024K FFT length: 9.508 ms., avg: 9.768 ms. Best time for 1280K FFT length: 12.303 ms., avg: 12.418 ms. Best time for 1536K FFT length: 14.999 ms., avg: 15.142 ms. Best time for 1792K FFT length: 18.287 ms., avg: 18.356 ms. Best time for 2048K FFT length: 20.227 ms., avg: 20.361 ms. Best time for 2560K FFT length: 26.262 ms., avg: 26.380 ms. Best time for 3072K FFT length: 31.668 ms., avg: 31.762 ms. Best time for 3584K FFT length: 38.144 ms., avg: 38.364 ms. Best time for 4096K FFT length: 42.237 ms., avg: 42.404 ms. Best time for 5120K FFT length: 54.871 ms., avg: 54.996 ms. Best time for 6144K FFT length: 68.655 ms., avg: 68.826 ms. Best time for 7168K FFT length: 82.420 ms., avg: 82.663 ms. Best time for 8192K FFT length: 90.886 ms., avg: 91.456 ms. Timing FFTs using 2 threads. Best time for 1024K FFT length: 4.864 ms., avg: 4.918 ms. Best time for 1280K FFT length: 6.314 ms., avg: 6.388 ms. Best time for 1536K FFT length: 7.647 ms., avg: 7.741 ms. Best time for 1792K FFT length: 9.385 ms., avg: 9.449 ms. Best time for 2048K FFT length: 10.340 ms., avg: 10.423 ms. Best time for 2560K FFT length: 13.370 ms., avg: 13.465 ms. Best time for 3072K FFT length: 16.091 ms., avg: 16.292 ms. Best time for 3584K FFT length: 19.393 ms., avg: 19.624 ms. Best time for 4096K FFT length: 21.453 ms., avg: 21.588 ms. Best time for 5120K FFT length: 27.850 ms., avg: 28.476 ms. Best time for 6144K FFT length: 34.854 ms., avg: 35.100 ms. Best time for 7168K FFT length: 41.837 ms., avg: 42.006 ms. Best time for 8192K FFT length: 46.029 ms., avg: 46.188 ms. Timing FFTs using 3 threads. Best time for 1024K FFT length: 3.412 ms., avg: 3.462 ms. Best time for 1280K FFT length: 4.457 ms., avg: 4.533 ms. Best time for 1536K FFT length: 5.287 ms., avg: 5.401 ms. Best time for 1792K FFT length: 6.556 ms., avg: 6.645 ms. Best time for 2048K FFT length: 7.277 ms., avg: 7.350 ms. Best time for 2560K FFT length: 9.316 ms., avg: 9.495 ms. Best time for 3072K FFT length: 11.275 ms., avg: 11.354 ms. Best time for 3584K FFT length: 13.431 ms., avg: 13.660 ms. Best time for 4096K FFT length: 14.977 ms., avg: 15.127 ms. Best time for 5120K FFT length: 19.226 ms., avg: 19.463 ms. Best time for 6144K FFT length: 24.403 ms., avg: 24.689 ms. Best time for 7168K FFT length: 28.934 ms., avg: 29.159 ms. Best time for 8192K FFT length: 31.774 ms., avg: 32.199 ms. Timing FFTs using 4 threads. Best time for 1024K FFT length: 2.730 ms., avg: 2.804 ms. Best time for 1280K FFT length: 3.577 ms., avg: 3.688 ms. Best time for 1536K FFT length: 4.234 ms., avg: 4.320 ms. Best time for 1792K FFT length: 5.218 ms., avg: 5.423 ms. Best time for 2048K FFT length: 5.985 ms., avg: 6.203 ms. Best time for 2560K FFT length: 7.661 ms., avg: 7.855 ms. Best time for 3072K FFT length: 9.270 ms., avg: 9.394 ms. Best time for 3584K FFT length: 11.039 ms., avg: 11.281 ms. Best time for 4096K FFT length: 12.421 ms., avg: 12.651 ms. Best time for 5120K FFT length: 15.811 ms., avg: 15.986 ms. Best time for 6144K FFT length: 20.161 ms., avg: 20.647 ms. Best time for 7168K FFT length: 23.814 ms., avg: 24.115 ms. Best time for 8192K FFT length: 26.010 ms., avg: 26.269 ms. Timings for 1024K FFT length (4 cpus, 4 workers): 11.17, 11.15, 11.15, 11.16 ms. Throughput: 358.47 iter/sec. Timings for 1280K FFT length (4 cpus, 4 workers): 14.62, 14.55, 14.55, 14.57 ms. Throughput: 274.43 iter/sec. Timings for 1536K FFT length (4 cpus, 4 workers): 16.92, 16.81, 16.74, 16.74 ms. Throughput: 238.06 iter/sec. Timings for 1792K FFT length (4 cpus, 4 workers): 20.66, 20.37, 20.73, 20.36 ms. Throughput: 194.87 iter/sec. Timings for 2048K FFT length (4 cpus, 4 workers): 26.05, 25.39, 26.14, 26.14 ms. Throughput: 154.29 iter/sec. Timings for 2560K FFT length (4 cpus, 4 workers): 28.62, 28.13, 28.07, 28.23 ms. Throughput: 141.54 iter/sec. Timings for 3072K FFT length (4 cpus, 4 workers): 37.11, 38.15, 36.84, 37.20 ms. Throughput: 107.18 iter/sec. Timings for 3584K FFT length (4 cpus, 4 workers): 42.48, 42.02, 41.94, 42.10 ms. Throughput: 94.94 iter/sec. Timings for 4096K FFT length (4 cpus, 4 workers): 47.66, 47.15, 46.92, 46.69 ms. Throughput: 84.92 iter/sec. Timings for 5120K FFT length (4 cpus, 4 workers): 61.24, 60.11, 59.79, 60.09 ms. Throughput: 66.33 iter/sec. Timings for 6144K FFT length (4 cpus, 4 workers): 75.05, 74.09, 73.57, 74.10 ms. Throughput: 53.91 iter/sec. Timings for 7168K FFT length (4 cpus, 4 workers): 92.21, 90.88, 90.51, 91.44 ms. Throughput: 43.83 iter/sec. Timings for 8192K FFT length (4 cpus, 4 workers): 106.61, 108.07, 111.84, 106.36 ms. Throughput: 36.98 iter/sec. |
|
|
|
|
|
#727 |
|
"/X\(‘-‘)/X\"
Jan 2013
2×5×293 Posts |
I wanted to see how changing the CPU frequency would affect the power consumption of my i5-6600 DDR4-2133 compute cluster. I created a spreadsheet with timings of 4 threads and 4 workers.
We've noticed before Skylake gets more throughput using 4 threads instead of 4 workers, but it turns out that changes with CPU frequency. When I underclock the CPU to 3.3 GHz, using 4 workers begins to be more performant. Spreadsheet. Underclocking is proving to be an efficiency win. I haven't yet played with undervolting. I also intend to gather more data while running 3 threads/workers. So far 4 threads/workers at 3.3 GHz is faster than 3 threads at 3.7 GHz. I haven't yet disabled a core in the BIOS to check power consumption differences. I strongly suspect 4 cores will win. |
|
|
|
|
|
#728 |
|
"Jacob"
Sep 2006
Brussels, Belgium
110101011102 Posts |
|
|
|
|
|
|
#729 |
|
Jun 2003
117378 Posts |
Based on the data, I think this statement is too strong. Even at 3.3 GHz, 1 worker is more performant that 4 workers, except for 3 FFTs where the difference is so small as to be negligible. Of course, the trend is there, so maybe 3.2 & below might show a clear superiority for the 4 workers.
|
|
|
|
|
|
#730 |
|
Oct 2007
Manchester, UK
54D16 Posts |
Of course, the real metric should be performance / total cost of ownership.
Add up the cost of the computer, as well as the cost of all previously used, and expected future use of electricity (based on chosen clock speed). Optionally also include the cost of your time working on setting up and configuring the machine. :) |
|
|
|
|
|
#731 |
|
Undefined
"The unspeakable one"
Jun 2006
My evil lair
141238 Posts |
And if you lose one week of work trying to figure out the sweet spot and gain 3% then you have to work 24/7 for 32 weeks just to break even.
|
|
|
|
|
|
#732 | ||
|
"/X\(‘-‘)/X\"
Jan 2013
2·5·293 Posts |
Quote:
Most impressive to me so far is that I can run at 3.3 GHz for 95% performance for 86% of running costs, before undervolting. Quote:
It would be hard to lose a week of work. I'm only experimenting on a single node, and when I am measuring the power draw I'm doing DC. Testing stability with undervolting will certainly take more time! |
||
|
|
|
|
|
#733 | |
|
Oct 2007
Manchester, UK
25158 Posts |
Quote:
How long do you plan on running these machines, and what do you project an extra ~10W / machine will cost over such a time? Edit: A rough calculation for 4 machines. 10W / machine is ~1kWh / day, and based on these figures electricity costs 11.16 c/kWh in Toronto now, but I'll round it up to 12 c/kWh. Over 4 years that comes to $175. Last fiddled with by lavalamp on 2016-12-26 at 23:35 |
|
|
|
|
|
|
#734 | ||
|
"/X\(‘-‘)/X\"
Jan 2013
2·5·293 Posts |
Unfortunately I wasn't able to underclock my CPU. I didn't know the fixed multiplier also prevents underclocking. I have yet to try undervolting.
Quote:
Quote:
The cluster, not including the Ethernet switch, consumes 370 watts when running at stock clocks. That's 266 kWh/month, or $591/year. So if I can save 50 watts across the cluster, that's 36 kWh/month, or $80/year. I figure I'll run the cluster at most 2.5 years more. |
||
|
|
|
|
|
#735 |
|
"/X\(‘-‘)/X\"
Jan 2013
55628 Posts |
Single Rank versus Dual Rank DDR3
I now have two i3-4170 systems, both running 2x4 GB of DDR3-1600 RAM in dual channel configuration. Dual rank: [Work thread Feb 14 03:20] Timing 46 iterations of 1024K FFT length. Best time: 4.355 ms., avg time: 4.413 ms. [Work thread Feb 14 03:20] Timing 36 iterations of 1280K FFT length. Best time: 5.554 ms., avg time: 5.562 ms. [Work thread Feb 14 03:20] Timing 30 iterations of 1536K FFT length. Best time: 6.644 ms., avg time: 6.651 ms. [Work thread Feb 14 03:20] Timing 26 iterations of 1792K FFT length. Best time: 8.000 ms., avg time: 8.007 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 2048K FFT length. Best time: 9.072 ms., avg time: 9.084 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 2560K FFT length. Best time: 11.528 ms., avg time: 11.545 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 3072K FFT length. Best time: 13.875 ms., avg time: 13.886 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 3584K FFT length. Best time: 16.453 ms., avg time: 16.466 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 4096K FFT length. Best time: 18.976 ms., avg time: 18.991 ms. [Work thread Feb 14 03:20] Timing FFTs using 2 threads on 2 physical CPUs. [Work thread Feb 14 03:20] Timing 46 iterations of 1024K FFT length. Best time: 2.401 ms., avg time: 2.417 ms. [Work thread Feb 14 03:20] Timing 36 iterations of 1280K FFT length. Best time: 3.047 ms., avg time: 3.114 ms. [Work thread Feb 14 03:20] Timing 30 iterations of 1536K FFT length. Best time: 3.651 ms., avg time: 3.732 ms. [Work thread Feb 14 03:20] Timing 26 iterations of 1792K FFT length. Best time: 5.132 ms., avg time: 5.152 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 2048K FFT length. Best time: 5.079 ms., avg time: 5.102 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 2560K FFT length. Best time: 6.309 ms., avg time: 6.408 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 3072K FFT length. Best time: 7.651 ms., avg time: 7.779 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 3584K FFT length. Best time: 9.211 ms., avg time: 9.278 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 4096K FFT length. Best time: 10.455 ms., avg time: 10.505 ms. [Work thread Feb 14 03:20] Benchmarking multiple workers to measure the impact of memory bandwidth [Work thread Feb 14 03:20] Timing 1024K FFT, 2 cpus, 1 worker. Average times: 2.41 ms. Total throughput: 414.57 iter/sec. [Work thread Feb 14 03:21] Timing 1024K FFT, 2 cpus, 2 workers. Average times: 4.79, 4.77 ms. Total throughput: 418.37 iter/sec. [Work thread Feb 14 03:21] Timing 1280K FFT, 2 cpus, 1 worker. Average times: 3.10 ms. Total throughput: 323.10 iter/sec. [Work thread Feb 14 03:21] Timing 1280K FFT, 2 cpus, 2 workers. Average times: 6.29, 6.00 ms. Total throughput: 325.74 iter/sec. [Work thread Feb 14 03:22] Timing 1536K FFT, 2 cpus, 1 worker. Average times: 3.67 ms. Total throughput: 272.60 iter/sec. [Work thread Feb 14 03:22] Timing 1536K FFT, 2 cpus, 2 workers. Average times: 7.20, 7.20 ms. Total throughput: 277.67 iter/sec. [Work thread Feb 14 03:23] Timing 1792K FFT, 2 cpus, 1 worker. Average times: 4.43 ms. Total throughput: 225.57 iter/sec. [Work thread Feb 14 03:23] Timing 1792K FFT, 2 cpus, 2 workers. Average times: 9.24, 8.61 ms. Total throughput: 224.36 iter/sec. [Work thread Feb 14 03:23] Timing 2048K FFT, 2 cpus, 1 worker. Average times: 5.05 ms. Total throughput: 198.00 iter/sec. [Work thread Feb 14 03:23] Timing 2048K FFT, 2 cpus, 2 workers. Average times: 10.26, 9.79 ms. Total throughput: 199.62 iter/sec. [Work thread Feb 14 03:24] Timing 2560K FFT, 2 cpus, 1 worker. Average times: 6.31 ms. Total throughput: 158.43 iter/sec. [Work thread Feb 14 03:24] Timing 2560K FFT, 2 cpus, 2 workers. Average times: 12.51, 12.51 ms. Total throughput: 159.83 iter/sec. [Work thread Feb 14 03:25] Timing 3072K FFT, 2 cpus, 1 worker. Average times: 7.88 ms. Total throughput: 126.94 iter/sec. [Work thread Feb 14 03:25] Timing 3072K FFT, 2 cpus, 2 workers. Average times: 16.51, 15.39 ms. Total throughput: 125.51 iter/sec. [Work thread Feb 14 03:25] Timing 3584K FFT, 2 cpus, 1 worker. Average times: 9.20 ms. Total throughput: 108.75 iter/sec. [Work thread Feb 14 03:26] Timing 3584K FFT, 2 cpus, 2 workers. Average times: 18.26, 17.89 ms. Total throughput: 110.63 iter/sec. [Work thread Feb 14 03:26] Timing 4096K FFT, 2 cpus, 1 worker. Average times: 10.55 ms. Total throughput: 94.78 iter/sec. [Work thread Feb 14 03:26] Timing 4096K FFT, 2 cpus, 2 workers. Average times: 20.70, 20.70 ms. Total throughput: 96.61 iter/sec. Single rank: [Work thread Feb 14 03:20] Timing 46 iterations of 1024K FFT length. Best time: 4.549 ms., avg time: 4.560 ms. [Work thread Feb 14 03:20] Timing 36 iterations of 1280K FFT length. Best time: 5.789 ms., avg time: 5.800 ms. [Work thread Feb 14 03:20] Timing 30 iterations of 1536K FFT length. Best time: 6.920 ms., avg time: 6.930 ms. [Work thread Feb 14 03:20] Timing 26 iterations of 1792K FFT length. Best time: 8.945 ms., avg time: 8.963 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 2048K FFT length. Best time: 9.365 ms., avg time: 9.378 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 2560K FFT length. Best time: 11.980 ms., avg time: 11.988 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 3072K FFT length. Best time: 14.513 ms., avg time: 14.525 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 3584K FFT length. Best time: 17.111 ms., avg time: 17.156 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 4096K FFT length. Best time: 19.565 ms., avg time: 19.578 ms. [Work thread Feb 14 03:20] Timing FFTs using 2 threads on 2 physical CPUs. [Work thread Feb 14 03:20] Timing 46 iterations of 1024K FFT length. Best time: 2.751 ms., avg time: 2.763 ms. [Work thread Feb 14 03:20] Timing 36 iterations of 1280K FFT length. Best time: 3.496 ms., avg time: 3.513 ms. [Work thread Feb 14 03:20] Timing 30 iterations of 1536K FFT length. Best time: 4.050 ms., avg time: 4.095 ms. [Work thread Feb 14 03:20] Timing 26 iterations of 1792K FFT length. Best time: 4.952 ms., avg time: 4.993 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 2048K FFT length. Best time: 5.687 ms., avg time: 5.714 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 2560K FFT length. Best time: 7.372 ms., avg time: 7.387 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 3072K FFT length. Best time: 8.716 ms., avg time: 8.803 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 3584K FFT length. Best time: 10.557 ms., avg time: 10.651 ms. [Work thread Feb 14 03:20] Timing 25 iterations of 4096K FFT length. Best time: 11.389 ms., avg time: 11.480 ms. [Work thread Feb 14 03:20] Benchmarking multiple workers to measure the impact of memory bandwidth [Work thread Feb 14 03:20] Timing 1024K FFT, 2 cpus, 1 worker. Average times: 2.61 ms. Total throughput: 383.01 iter/sec. [Work thread Feb 14 03:21] Timing 1024K FFT, 2 cpus, 2 workers. Average times: 5.25, 5.23 ms. Total throughput: 381.51 iter/sec. [Work thread Feb 14 03:21] Timing 1280K FFT, 2 cpus, 1 worker. Average times: 3.34 ms. Total throughput: 299.68 iter/sec. [Work thread Feb 14 03:21] Timing 1280K FFT, 2 cpus, 2 workers. Average times: 6.59, 6.56 ms. Total throughput: 304.07 iter/sec. [Work thread Feb 14 03:22] Timing 1536K FFT, 2 cpus, 1 worker. Average times: 3.96 ms. Total throughput: 252.35 iter/sec. [Work thread Feb 14 03:22] Timing 1536K FFT, 2 cpus, 2 workers. Average times: 7.90, 7.89 ms. Total throughput: 253.21 iter/sec. [Work thread Feb 14 03:22] Timing 1792K FFT, 2 cpus, 1 worker. Average times: 4.84 ms. Total throughput: 206.76 iter/sec. [Work thread Feb 14 03:23] Timing 1792K FFT, 2 cpus, 2 workers. Average times: 9.61, 9.61 ms. Total throughput: 208.08 iter/sec. [Work thread Feb 14 03:23] Timing 2048K FFT, 2 cpus, 1 worker. Average times: 5.52 ms. Total throughput: 181.32 iter/sec. [Work thread Feb 14 03:23] Timing 2048K FFT, 2 cpus, 2 workers. Average times: 10.87, 10.87 ms. Total throughput: 183.92 iter/sec. [Work thread Feb 14 03:24] Timing 2560K FFT, 2 cpus, 1 worker. Average times: 6.99 ms. Total throughput: 143.16 iter/sec. [Work thread Feb 14 03:24] Timing 2560K FFT, 2 cpus, 2 workers. Average times: 13.98, 13.89 ms. Total throughput: 143.52 iter/sec. [Work thread Feb 14 03:25] Timing 3072K FFT, 2 cpus, 1 worker. Average times: 8.47 ms. Total throughput: 118.00 iter/sec. [Work thread Feb 14 03:25] Timing 3072K FFT, 2 cpus, 2 workers. Average times: 16.75, 16.62 ms. Total throughput: 119.87 iter/sec. [Work thread Feb 14 03:25] Timing 3584K FFT, 2 cpus, 1 worker. Average times: 10.03 ms. Total throughput: 99.69 iter/sec. [Work thread Feb 14 03:26] Timing 3584K FFT, 2 cpus, 2 workers. Average times: 20.07, 19.75 ms. Total throughput: 100.46 iter/sec. [Work thread Feb 14 03:26] Timing 4096K FFT, 2 cpus, 1 worker. Average times: 11.44 ms. Total throughput: 87.38 iter/sec. [Work thread Feb 14 03:26] Timing 4096K FFT, 2 cpus, 2 workers. Average times: 22.67, 22.57 ms. Total throughput: 88.42 iter/sec. TL;DR: get dual rank memory. |
|
|
|
|
|
#736 | |
|
"Kieren"
Jul 2011
In My Own Galaxy!
100111101011102 Posts |
Quote:
|
|
|
|
|
|
|
#737 |
|
"Victor de Hollander"
Aug 2011
the Netherlands
23·3·72 Posts |
Just for fun I ran the Prime95 benchmark on my old 11'' netbook with AMD E-350 @1.6GHz
![]() ![]() ![]() Code:
AMD E-350 Processor CPU speed: 1596.06 MHz, 2 cores CPU features: 3DNow! Prefetch, SSE, SSE2 L1 cache size: 32 KB L2 cache size: 512 KB L1 cache line size: 64 bytes L2 cache line size: 64 bytes L1 TLBS: 40 L2 TLBS: 512 Prime95 32-bit version 28.10, RdtscTiming=1 Best time for 1024K FFT length: 99.916 ms., avg: 103.876 ms. Best time for 1280K FFT length: 142.989 ms., avg: 149.942 ms. Best time for 1536K FFT length: 167.401 ms., avg: 174.861 ms. Best time for 1792K FFT length: 196.051 ms., avg: 213.505 ms. Best time for 2048K FFT length: 222.136 ms., avg: 231.517 ms. Best time for 2560K FFT length: 275.761 ms., avg: 289.138 ms. Best time for 3072K FFT length: 347.029 ms., avg: 357.043 ms. Best time for 3584K FFT length: 403.094 ms., avg: 417.261 ms. Best time for 4096K FFT length: 458.478 ms., avg: 477.660 ms. Best time for 5120K FFT length: 645.214 ms., avg: 656.045 ms. Best time for 6144K FFT length: 778.602 ms., avg: 801.066 ms. Best time for 7168K FFT length: 936.351 ms., avg: 955.565 ms. Best time for 8192K FFT length: 1056.663 ms., avg: 1081.029 ms. Timing FFTs using 2 threads. Best time for 1024K FFT length: 50.489 ms., avg: 51.903 ms. Best time for 1280K FFT length: 71.734 ms., avg: 74.501 ms. Best time for 1536K FFT length: 83.726 ms., avg: 86.498 ms. Best time for 1792K FFT length: 98.136 ms., avg: 101.268 ms. Best time for 2048K FFT length: 111.295 ms., avg: 113.926 ms. Best time for 2560K FFT length: 137.782 ms., avg: 144.518 ms. Best time for 3072K FFT length: 172.050 ms., avg: 178.164 ms. Best time for 3584K FFT length: 203.060 ms., avg: 208.392 ms. Best time for 4096K FFT length: 230.258 ms., avg: 236.426 ms. Best time for 5120K FFT length: 319.971 ms., avg: 327.913 ms. Best time for 6144K FFT length: 397.567 ms., avg: 418.475 ms. Best time for 7168K FFT length: 477.554 ms., avg: 487.654 ms. Best time for 8192K FFT length: 524.522 ms., avg: 534.575 ms. Timings for 1024K FFT length (2 cpus, 2 workers): 104.66, 98.42 ms. Throughput: 19.72 iter/sec. Timings for 1280K FFT length (2 cpus, 2 workers): 149.56, 140.02 ms. Throughput: 13.83 iter/sec. Timings for 1536K FFT length (2 cpus, 2 workers): 176.08, 163.24 ms. Throughput: 11.81 iter/sec. Timings for 1792K FFT length (2 cpus, 2 workers): 198.76, 186.35 ms. Throughput: 10.40 iter/sec. Timings for 2048K FFT length (2 cpus, 2 workers): 223.87, 209.46 ms. Throughput: 9.24 iter/sec. Timings for 2560K FFT length (2 cpus, 2 workers): 288.10, 267.54 ms. Throughput: 7.21 iter/sec. Timings for 3072K FFT length (2 cpus, 2 workers): 356.97, 335.54 ms. Throughput: 5.78 iter/sec. Timings for 3584K FFT length (2 cpus, 2 workers): 429.52, 401.99 ms. Throughput: 4.82 iter/sec. Timings for 4096K FFT length (2 cpus, 2 workers): 497.72, 464.37 ms. Throughput: 4.16 iter/sec. Timings for 5120K FFT length (2 cpus, 2 workers): 695.86, 655.48 ms. Throughput: 2.96 iter/sec. Timings for 6144K FFT length (2 cpus, 2 workers): 792.26, 738.64 ms. Throughput: 2.62 iter/sec. Timings for 7168K FFT length (2 cpus, 2 workers): 986.58, 920.18 ms. Throughput: 2.10 iter/sec. Timings for 8192K FFT length (2 cpus, 2 workers): 1096.14, 1023.91 ms. Throughput: 1.89 iter/sec. |
|
|
|
|
|
#738 |
|
Mar 2017
PNW
1 Posts |
AMD Ryzen 7 1800X Eight-Core Processor
CPU speed: 3447.35 MHz, 16 cores CPU features: 3DNow! Prefetch, SSE, SSE2, SSE4, AVX, AVX2, FMA L1 cache size: 32 KB L2 cache size: 512 KB, L3 cache size: 16 MB L1 cache line size: 64 bytes L2 cache line size: 64 bytes L1 TLBS: 64 L2 TLBS: 1536 AMD Ryzen 7 1800X Eight-Core Processor CPU speed: 3816.00 MHz, 16 cores CPU features: 3DNow! Prefetch, SSE, SSE2, SSE4, AVX, AVX2, FMA L1 cache size: 32 KB L2 cache size: 512 KB, L3 cache size: 16 MB L1 cache line size: 64 bytes L2 cache line size: 64 bytes L1 TLBS: 64 L2 TLBS: 1536 Prime95 64-bit version 28.10, RdtscTiming=1 Prime95 64-bit version 28.10, RdtscTiming=1 AMD Ryzen 7 1800X Eight-Core Processor CPU speed: 3545.99 MHz, 16 cores CPU features: 3DNow! Prefetch, SSE, SSE2, SSE4, AVX, AVX2, FMA L1 cache size: 32 KB L2 cache size: 512 KB, L3 cache size: 16 MB L1 cache line size: 64 bytes L2 cache line size: 64 bytes L1 TLBS: 64 L2 TLBS: 1536 AMD Ryzen 7 1800X Eight-Core Processor CPU speed: 3816.00 MHz, 16 cores CPU features: 3DNow! Prefetch, SSE, SSE2, SSE4, AVX, AVX2, FMA L1 cache size: 32 KB L2 cache size: 512 KB, L3 cache size: 16 MB L1 cache line size: 64 bytes L2 cache line size: 64 bytes L1 TLBS: 64 L2 TLBS: 1536 Prime95 64-bit version 28.10, RdtscTiming=1 AMD Ryzen 7 1800X Eight-Core Processor CPU speed: 3592.12 MHz, 16 cores CPU features: 3DNow! Prefetch, SSE, SSE2, SSE4, AVX, AVX2, FMA L1 cache size: 32 KB L2 cache size: 512 KB, L3 cache size: 16 MB L1 cache line size: 64 bytes L2 cache line size: 64 bytes L1 TLBS: 64 L2 TLBS: 1536 Prime95 64-bit version 28.10, RdtscTiming=1 Prime95 64-bit version 28.10, RdtscTiming=1 Compare your results to other computers at http://www.mersenne.org/report_benchmarks AMD Ryzen 7 1800X Eight-Core Processor CPU speed: 3422.28 MHz, 16 cores CPU features: 3DNow! Prefetch, SSE, SSE2, SSE4, AVX, AVX2, FMA L1 cache size: 32 KB L2 cache size: 512 KB, L3 cache size: 16 MB L1 cache line size: 64 bytes L2 cache line size: 64 bytes L1 TLBS: 64 L2 TLBS: 1536 Prime95 64-bit version 28.10, RdtscTiming=1 Best time for 1024K FFT length: 13.807 ms., avg: 14.273 ms. Best time for 1280K FFT length: 18.159 ms., avg: 18.537 ms. Best time for 1536K FFT length: 22.682 ms., avg: 23.107 ms. Best time for 1792K FFT length: 26.061 ms., avg: 26.566 ms. Best time for 2048K FFT length: 29.822 ms., avg: 30.188 ms. Best time for 2560K FFT length: 37.698 ms., avg: 39.824 ms. Best time for 3072K FFT length: 45.986 ms., avg: 46.468 ms. Best time for 3584K FFT length: 53.277 ms., avg: 53.670 ms. Best time for 4096K FFT length: 61.075 ms., avg: 61.465 ms. Best time for 5120K FFT length: 79.219 ms., avg: 79.853 ms. Best time for 6144K FFT length: 97.915 ms., avg: 98.519 ms. Best time for 7168K FFT length: 114.295 ms., avg: 115.312 ms. Best time for 8192K FFT length: 126.941 ms., avg: 127.963 ms. Timing FFTs using 2 threads. Best time for 1024K FFT length: 7.076 ms., avg: 7.169 ms. Best time for 1280K FFT length: 9.170 ms., avg: 9.299 ms. Best time for 1536K FFT length: 11.285 ms., avg: 11.411 ms. Best time for 1792K FFT length: 13.462 ms., avg: 13.579 ms. Best time for 2048K FFT length: 14.929 ms., avg: 15.151 ms. Best time for 2560K FFT length: 18.810 ms., avg: 19.041 ms. Best time for 3072K FFT length: 23.086 ms., avg: 23.249 ms. Best time for 3584K FFT length: 27.528 ms., avg: 27.643 ms. Best time for 4096K FFT length: 30.559 ms., avg: 30.740 ms. Best time for 5120K FFT length: 39.988 ms., avg: 40.245 ms. Best time for 6144K FFT length: 48.979 ms., avg: 49.186 ms. Best time for 7168K FFT length: 58.807 ms., avg: 59.122 ms. Best time for 8192K FFT length: 63.872 ms., avg: 64.181 ms. Timing FFTs using 3 threads. Best time for 1024K FFT length: 4.809 ms., avg: 4.861 ms. Best time for 1280K FFT length: 6.261 ms., avg: 6.294 ms. Best time for 1536K FFT length: 7.697 ms., avg: 7.875 ms. Best time for 1792K FFT length: 9.046 ms., avg: 9.085 ms. Best time for 2048K FFT length: 10.120 ms., avg: 10.228 ms. Best time for 2560K FFT length: 12.718 ms., avg: 13.066 ms. Best time for 3072K FFT length: 15.739 ms., avg: 15.773 ms. Best time for 3584K FFT length: 18.524 ms., avg: 18.633 ms. Best time for 4096K FFT length: 20.753 ms., avg: 20.859 ms. Best time for 5120K FFT length: 27.089 ms., avg: 27.224 ms. Best time for 6144K FFT length: 33.261 ms., avg: 33.396 ms. Best time for 7168K FFT length: 39.614 ms., avg: 39.760 ms. Best time for 8192K FFT length: 43.169 ms., avg: 43.302 ms. Timing FFTs using 4 threads. Best time for 1024K FFT length: 3.602 ms., avg: 3.636 ms. Best time for 1280K FFT length: 4.677 ms., avg: 4.888 ms. Best time for 1536K FFT length: 5.749 ms., avg: 5.790 ms. Best time for 1792K FFT length: 6.857 ms., avg: 7.013 ms. Best time for 2048K FFT length: 7.604 ms., avg: 7.692 ms. Best time for 2560K FFT length: 9.624 ms., avg: 9.911 ms. Best time for 3072K FFT length: 11.773 ms., avg: 11.844 ms. Best time for 3584K FFT length: 14.034 ms., avg: 14.151 ms. Best time for 4096K FFT length: 15.621 ms., avg: 15.658 ms. Best time for 5120K FFT length: 20.389 ms., avg: 20.476 ms. Best time for 6144K FFT length: 25.047 ms., avg: 25.197 ms. Best time for 7168K FFT length: 30.019 ms., avg: 30.175 ms. Best time for 8192K FFT length: 32.537 ms., avg: 32.675 ms. Timing FFTs using 5 threads. Best time for 1024K FFT length: 2.925 ms., avg: 2.953 ms. Best time for 1280K FFT length: 3.802 ms., avg: 3.868 ms. Best time for 1536K FFT length: 4.691 ms., avg: 4.757 ms. Best time for 1792K FFT length: 5.526 ms., avg: 5.581 ms. Best time for 2048K FFT length: 6.164 ms., avg: 6.211 ms. Best time for 2560K FFT length: 7.778 ms., avg: 7.811 ms. Best time for 3072K FFT length: 9.525 ms., avg: 10.635 ms. Best time for 3584K FFT length: 11.330 ms., avg: 11.412 ms. Best time for 4096K FFT length: 12.643 ms., avg: 12.736 ms. Best time for 5120K FFT length: 16.489 ms., avg: 17.282 ms. Best time for 6144K FFT length: 20.238 ms., avg: 20.950 ms. Best time for 7168K FFT length: 24.345 ms., avg: 24.513 ms. Best time for 8192K FFT length: 26.286 ms., avg: 26.744 ms. Timing FFTs using 6 threads. Best time for 1024K FFT length: 2.470 ms., avg: 2.858 ms. Best time for 1280K FFT length: 3.207 ms., avg: 3.360 ms. Best time for 1536K FFT length: 3.912 ms., avg: 3.999 ms. Best time for 1792K FFT length: 4.741 ms., avg: 5.378 ms. Best time for 2048K FFT length: 5.243 ms., avg: 5.828 ms. Best time for 2560K FFT length: 6.570 ms., avg: 7.218 ms. Best time for 3072K FFT length: 8.085 ms., avg: 8.629 ms. Best time for 3584K FFT length: 9.679 ms., avg: 11.073 ms. Best time for 4096K FFT length: 10.725 ms., avg: 11.134 ms. Best time for 5120K FFT length: 13.989 ms., avg: 14.172 ms. Best time for 6144K FFT length: 17.201 ms., avg: 17.386 ms. Best time for 7168K FFT length: 21.077 ms., avg: 21.710 ms. Best time for 8192K FFT length: 22.249 ms., avg: 22.662 ms. Timing FFTs using 7 threads. Best time for 1024K FFT length: 2.160 ms., avg: 2.268 ms. Best time for 1280K FFT length: 3.178 ms., avg: 4.154 ms. Best time for 1536K FFT length: 3.406 ms., avg: 3.569 ms. Best time for 1792K FFT length: 4.156 ms., avg: 4.415 ms. Best time for 2048K FFT length: 4.602 ms., avg: 4.904 ms. Best time for 2560K FFT length: 5.741 ms., avg: 6.230 ms. Best time for 3072K FFT length: 7.109 ms., avg: 7.462 ms. |
|
|
|
|
|
#739 |
|
"/X\(‘-‘)/X\"
Jan 2013
1011011100102 Posts |
Clear Linux gives a small boost in mprime throughput.
I've recently been playing around with Intel's Clear Linux distribution. It's compiled with optimizations and built specifically for Intel's latest processors. Given that mprime's LL is mostly hand-tuned assembly, I wasn't expecting to see a difference in performance compared to Ubuntu 16.04, but I have. I'm running my cluster of i5-6600's at 3.3 GHz, as the dual rank, dual channel DDR3-2133 makes it not worth the watts to run the CPUs any faster. That being said, Clear Linux at 3.3 GHz is up to 3% faster than Ubuntu at 3.6 GHz. I've updated my benchmark spreadsheet. My guess is the difference comes down to different kernels and fewer background tasks running. |
|
|
|
|
|
#740 |
|
"/X\(‘-‘)/X\"
Jan 2013
2·5·293 Posts |
I finally got around to experimenting with undervolting. So far I've lowered VCore by 0.10 volts and I've passed a 7 hour stress test. The result? Saved another 12.5 watts per node, so my 4 node cluster is now consuming only 270 watts at the wall, or 243 from the nodes (at 3.3 GHz all cores). With a 4096 FFT, 4 cores take 5.37 ms/iter, for 2.76 iter/sec/watt at the wall, or 3.06 iter/sec/watt from the nodes.
Compare that to the GTX 1080 Ti, which consumes 180 watts from the card to get 2.63 ms/iter, for 2.12 iter/sec/watt. I wasn't expecting CPUs to be 44% more efficient. I'm going to try lowering VCore more soon. I might have to add more nodes to this power supply. |
|
|
|
|
|
#741 |
|
Banned
"Luigi"
Aug 2002
Team Italia
61·79 Posts |
Code:
Compare your results to other computers at http://www.mersenne.org/report_benchmarks
Intel(R) Celeron(R) CPU N2840 @ 2.16GHz
CPU speed: 2557.70 MHz, 2 cores
CPU features: Prefetchw, SSE, SSE2, SSE4
L1 cache size: 24 KB
L2 cache size: 1 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 128
Machine topology as determined by hwloc library:
Machine#0 (total=2492796KB, Backend=Windows, hwlocVersion=1.11.6, ProcessName=prime95.exe)
NUMANode#0 (local=2492796KB, total=2492796KB)
Package#0 (CPUVendor=GenuineIntel, CPUFamilyNumber=6, CPUModelNumber=55, CPUModel="Intel(R) Celeron(R) CPU N2840 @ 2.16GHz", CPUStepping=8)
L2 (size=1024KB, linesize=64, ways=16, Inclusive=0)
L1d (size=24KB, linesize=64, ways=6, Inclusive=0)
Core (cpuset: 0x00000001)
PU#0 (cpuset: 0x00000001)
L1d (size=24KB, linesize=64, ways=6, Inclusive=0)
Core (cpuset: 0x00000002)
PU#1 (cpuset: 0x00000002)
Prime95 64-bit version 29.1, RdtscTiming=1
Timing FFTs using 2 cores.
Best time for 1024K FFT length: 22.048 ms., avg: 23.196 ms.
Best time for 1280K FFT length: 29.125 ms., avg: 30.133 ms.
Best time for 1536K FFT length: 35.795 ms., avg: 36.288 ms.
Best time for 1792K FFT length: 45.152 ms., avg: 46.324 ms.
Best time for 2048K FFT length: 47.919 ms., avg: 49.040 ms.
Best time for 2560K FFT length: 60.895 ms., avg: 64.173 ms.
Best time for 3072K FFT length: 77.295 ms., avg: 80.964 ms.
Best time for 3584K FFT length: 97.452 ms., avg: 98.772 ms.
Best time for 4096K FFT length: 117.728 ms., avg: 118.825 ms.
Best time for 5120K FFT length: 144.734 ms., avg: 148.510 ms.
Best time for 6144K FFT length: 186.521 ms., avg: 188.309 ms.
Best time for 7168K FFT length: 282.553 ms., avg: 284.238 ms.
Best time for 8192K FFT length: 302.990 ms., avg: 306.707 ms.
|
|
|
|
|
|
#742 |
|
"James Heinrich"
May 2004
ex-Northern Ontario
342710 Posts |
Almost useful to me, except you only posted timing for 2 cores, not a the single-thread test I need for benchmarks.
|
|
|
|
|
|
#743 | |
|
Banned
"Luigi"
Aug 2002
Team Italia
61×79 Posts |
Quote:
Here are the results for the option Throughput benchmark... Code:
[Mon Apr 10 10:53:20 2017]
Compare your results to other computers at http://www.mersenne.org/report_benchmarks
Intel(R) Celeron(R) CPU N2840 @ 2.16GHz
CPU speed: 2557.77 MHz, 2 cores
CPU features: Prefetchw, SSE, SSE2, SSE4
L1 cache size: 24 KB
L2 cache size: 1 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 128
Machine topology as determined by hwloc library:
Machine#0 (total=2170248KB, Backend=Windows, hwlocVersion=1.11.6, ProcessName=prime95.exe)
NUMANode#0 (local=2170248KB, total=2170248KB)
Package#0 (CPUVendor=GenuineIntel, CPUFamilyNumber=6, CPUModelNumber=55, CPUModel="Intel(R) Celeron(R) CPU N2840 @ 2.16GHz", CPUStepping=8)
L2 (size=1024KB, linesize=64, ways=16, Inclusive=0)
L1d (size=24KB, linesize=64, ways=6, Inclusive=0)
Core (cpuset: 0x00000001)
PU#0 (cpuset: 0x00000001)
L1d (size=24KB, linesize=64, ways=6, Inclusive=0)
Core (cpuset: 0x00000002)
PU#1 (cpuset: 0x00000002)
Prime95 64-bit version 29.1, RdtscTiming=1
Timings for 1024K FFT length (2 cpus, 1 worker): 23.38 ms. Throughput: 42.76 iter/sec.
Timings for 1024K FFT length (2 cpus, 2 workers): 46.76, 46.03 ms. Throughput: 43.11 iter/sec.
Timings for 1280K FFT length (2 cpus, 1 worker): 30.89 ms. Throughput: 32.37 iter/sec.
Timings for 1280K FFT length (2 cpus, 2 workers): 61.83, 60.54 ms. Throughput: 32.69 iter/sec.
Timings for 1536K FFT length (2 cpus, 1 worker): 37.43 ms. Throughput: 26.72 iter/sec.
Timings for 1536K FFT length (2 cpus, 2 workers): 76.86, 74.73 ms. Throughput: 26.39 iter/sec.
Timings for 1792K FFT length (2 cpus, 1 worker): 48.25 ms. Throughput: 20.73 iter/sec.
Timings for 1792K FFT length (2 cpus, 2 workers): 97.16, 91.82 ms. Throughput: 21.18 iter/sec.
Timings for 2048K FFT length (2 cpus, 1 worker): 51.60 ms. Throughput: 19.38 iter/sec.
Timings for 2048K FFT length (2 cpus, 2 workers): 103.15, 99.23 ms. Throughput: 19.77 iter/sec.
Timings for 2560K FFT length (2 cpus, 1 worker): 64.17 ms. Throughput: 15.58 iter/sec.
Timings for 2560K FFT length (2 cpus, 2 workers): 128.12, 124.46 ms. Throughput: 15.84 iter/sec.
Timings for 3072K FFT length (2 cpus, 1 worker): 80.92 ms. Throughput: 12.36 iter/sec.
Timings for 3072K FFT length (2 cpus, 2 workers): 217.55, 216.71 ms. Throughput: 9.21 iter/sec.
Timings for 3584K FFT length (2 cpus, 1 worker): 114.14 ms. Throughput: 8.76 iter/sec.
Timings for 3584K FFT length (2 cpus, 2 workers): 322.22, 260.20 ms. Throughput: 6.95 iter/sec.
[Mon Apr 10 10:58:35 2017]
Timings for 4096K FFT length (2 cpus, 1 worker): 152.85 ms. Throughput: 6.54 iter/sec.
Timings for 4096K FFT length (2 cpus, 2 workers): 343.11, 248.39 ms. Throughput: 6.94 iter/sec.
Timings for 5120K FFT length (2 cpus, 1 worker): 209.21 ms. Throughput: 4.78 iter/sec.
Timings for 5120K FFT length (2 cpus, 2 workers): 474.79, 399.13 ms. Throughput: 4.61 iter/sec.
Timings for 6144K FFT length (2 cpus, 1 worker): 240.27 ms. Throughput: 4.16 iter/sec.
Timings for 6144K FFT length (2 cpus, 2 workers): 694.67, 595.62 ms. Throughput: 3.12 iter/sec.
Timings for 7168K FFT length (2 cpus, 1 worker): 805.39 ms. Throughput: 1.24 iter/sec.
Timings for 7168K FFT length (2 cpus, 2 workers): 1108.76, 926.34 ms. Throughput: 1.98 iter/sec.
Timings for 8192K FFT length (2 cpus, 1 worker): 1045.18 ms. Throughput: 0.96 iter/sec.
Timings for 8192K FFT length (2 cpus, 2 workers): 661.82, 562.99 ms. Throughput: 3.29 iter/sec.
Code:
[Mon Apr 10 11:05:55 2017]
Compare your results to other computers at http://www.mersenne.org/report_benchmarks
Intel(R) Celeron(R) CPU N2840 @ 2.16GHz
CPU speed: 2558.10 MHz, 2 cores
CPU features: Prefetchw, SSE, SSE2, SSE4
L1 cache size: 24 KB
L2 cache size: 1 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 128
Machine topology as determined by hwloc library:
Machine#0 (total=2170248KB, Backend=Windows, hwlocVersion=1.11.6, ProcessName=prime95.exe)
NUMANode#0 (local=2170248KB, total=2170248KB)
Package#0 (CPUVendor=GenuineIntel, CPUFamilyNumber=6, CPUModelNumber=55, CPUModel="Intel(R) Celeron(R) CPU N2840 @ 2.16GHz", CPUStepping=8)
L2 (size=1024KB, linesize=64, ways=16, Inclusive=0)
L1d (size=24KB, linesize=64, ways=6, Inclusive=0)
Core (cpuset: 0x00000001)
PU#0 (cpuset: 0x00000001)
L1d (size=24KB, linesize=64, ways=6, Inclusive=0)
Core (cpuset: 0x00000002)
PU#1 (cpuset: 0x00000002)
Prime95 64-bit version 29.1, RdtscTiming=1
Best time for 61 bit trial factors: 7.731 ms.
Best time for 62 bit trial factors: 20.862 ms.
Best time for 63 bit trial factors: 15.234 ms.
Best time for 64 bit trial factors: 17.497 ms.
Best time for 65 bit trial factors: 19.764 ms.
Best time for 66 bit trial factors: 19.450 ms.
Best time for 67 bit trial factors: 54.953 ms.
Best time for 75 bit trial factors: 78.660 ms.
Best time for 76 bit trial factors: 1.207 ms.
Best time for 77 bit trial factors: 22.409 ms.
Compare your results to other computers at http://www.mersenne.org/report_benchmarks
Intel(R) Celeron(R) CPU N2840 @ 2.16GHz
CPU speed: 2557.81 MHz, 2 cores
CPU features: Prefetchw, SSE, SSE2, SSE4
L1 cache size: 24 KB
L2 cache size: 1 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 128
Machine topology as determined by hwloc library:
Machine#0 (total=2170248KB, Backend=Windows, hwlocVersion=1.11.6, ProcessName=prime95.exe)
NUMANode#0 (local=2170248KB, total=2170248KB)
Package#0 (CPUVendor=GenuineIntel, CPUFamilyNumber=6, CPUModelNumber=55, CPUModel="Intel(R) Celeron(R) CPU N2840 @ 2.16GHz", CPUStepping=8)
L2 (size=1024KB, linesize=64, ways=16, Inclusive=0)
L1d (size=24KB, linesize=64, ways=6, Inclusive=0)
Core (cpuset: 0x00000001)
PU#0 (cpuset: 0x00000001)
L1d (size=24KB, linesize=64, ways=6, Inclusive=0)
Core (cpuset: 0x00000002)
PU#1 (cpuset: 0x00000002)
Prime95 64-bit version 29.1, RdtscTiming=1
Best time for 61 bit trial factors: 7.615 ms.
Best time for 62 bit trial factors: 7.840 ms.
Best time for 63 bit trial factors: 11.009 ms.
Best time for 64 bit trial factors: 13.912 ms.
Best time for 65 bit trial factors: 17.365 ms.
Best time for 66 bit trial factors: 18.360 ms.
Best time for 67 bit trial factors: 18.486 ms.
Best time for 75 bit trial factors: 46.705 ms.
Best time for 76 bit trial factors: 18.084 ms.
Best time for 77 bit trial factors: 19.320 ms.
![]() Luigi Last fiddled with by ET_ on 2017-04-10 at 09:18 |
|
|
|
|
|
|
#744 |
|
Jan 2003
7×29 Posts |
I posted the below results from my Ryzen 1700 (non-X) in the AMD Zen speculation thread earlier. Just thought I'd consolidate the results together with all the other benchmarks in this thread and also add a bit more detail on the setup.
CPU: AMD Ryzen 1700 (non-X) Frequency: 3.32GHz @ 1.031V (stock rating 3GHz / Turbo 3.7GHz) Heatsink: AMD Wraith Spire Memory: Corsair 8GBx2 @ 2933GHz CAS16 (single rank) Motherboard Asus X370-Pro BIOS: 0604 (AGESA 1.0.0.4a) Operating system: Windows 10 x64 Creators Update Prime95 version: 29.1 Build 15 Code:
AMD Ryzen 7 1700 Eight-Core Processor CPU speed: 3318.72 MHz, 8 hyperthreaded cores CPU features: 3DNow! Prefetch, SSE, SSE2, SSE4, AVX, AVX2, FMA L1 cache size: 32 KB L2 cache size: 512 KB, L3 cache size: 16 MB L1 cache line size: 64 bytes L2 cache line size: 64 bytes L1 TLBS: 64 L2 TLBS: 1536 Prime95 64-bit version 29.1, RdtscTiming=1 I rearranged the benchmark results below for a bit easier reading / comparison: Timings for 1024K FFT length (1 cpu, 1 worker): 7.83 ms. Throughput: 127.69 iter/sec. Timings for 1280K FFT length (1 cpu, 1 worker): 9.88 ms. Throughput: 101.17 iter/sec. Timings for 1536K FFT length (1 cpu, 1 worker): 11.97 ms. Throughput: 83.57 iter/sec. Timings for 1792K FFT length (1 cpu, 1 worker): 14.58 ms. Throughput: 68.60 iter/sec. Timings for 2048K FFT length (1 cpu, 1 worker): 16.05 ms. Throughput: 62.29 iter/sec. Timings for 2560K FFT length (1 cpu, 1 worker): 20.60 ms. Throughput: 48.55 iter/sec. Timings for 3072K FFT length (1 cpu, 1 worker): 24.87 ms. Throughput: 40.20 iter/sec. Timings for 3584K FFT length (1 cpu, 1 worker): 29.90 ms. Throughput: 33.44 iter/sec. Timings for 4096K FFT length (1 cpu, 1 worker): 34.18 ms. Throughput: 29.26 iter/sec. Timings for 5120K FFT length (1 cpu, 1 worker): 42.60 ms. Throughput: 23.48 iter/sec. Timings for 6144K FFT length (1 cpu, 1 worker): 50.67 ms. Throughput: 19.74 iter/sec. Timings for 7168K FFT length (1 cpu, 1 worker): 60.12 ms. Throughput: 16.63 iter/sec. Timings for 8192K FFT length (1 cpu, 1 worker): 68.76 ms. Throughput: 14.54 iter/sec. Timings for 1024K FFT length (8 cpus, 1 worker): 1.13 ms. Throughput: 886.42 iter/sec. Timings for 1280K FFT length (8 cpus, 1 worker): 1.42 ms. Throughput: 704.55 iter/sec. Timings for 1536K FFT length (8 cpus, 1 worker): 1.71 ms. Throughput: 584.87 iter/sec. Timings for 1792K FFT length (8 cpus, 1 worker): 2.10 ms. Throughput: 475.44 iter/sec. Timings for 2048K FFT length (8 cpus, 1 worker): 2.39 ms. Throughput: 418.60 iter/sec. Timings for 2560K FFT length (8 cpus, 1 worker): 3.96 ms. Throughput: 252.38 iter/sec. Timings for 3072K FFT length (8 cpus, 1 worker): 4.97 ms. Throughput: 201.08 iter/sec. Timings for 3584K FFT length (8 cpus, 1 worker): 5.97 ms. Throughput: 167.51 iter/sec. Timings for 4096K FFT length (8 cpus, 1 worker): 6.92 ms. Throughput: 144.58 iter/sec. Timings for 5120K FFT length (8 cpus, 1 worker): 7.32 ms. Throughput: 136.59 iter/sec. Timings for 6144K FFT length (8 cpus, 1 worker): 9.37 ms. Throughput: 106.71 iter/sec. Timings for 7168K FFT length (8 cpus, 1 worker): 10.96 ms. Throughput: 91.21 iter/sec. Timings for 8192K FFT length (8 cpus, 1 worker): 12.69 ms. Throughput: 78.83 iter/sec. Timings for 1024K FFT length (8 cpus, 8 workers): 11.30, 11.41, 11.28, 11.22, 11.18, 11.18, 11.21, 11.20 ms. Throughput: 711.26 iter/sec. Timings for 1280K FFT length (8 cpus, 8 workers): 14.15, 14.51, 14.13, 14.15, 14.03, 14.05, 14.13, 14.16 ms. Throughput: 564.84 iter/sec. Timings for 1536K FFT length (8 cpus, 8 workers): 16.81, 17.45, 16.96, 17.00, 16.84, 16.82, 16.91, 16.82 ms. Throughput: 472.01 iter/sec. Timings for 1792K FFT length (8 cpus, 8 workers): 20.85, 21.81, 20.92, 21.12, 20.68, 20.92, 21.25, 20.77 ms. Throughput: 380.31 iter/sec. Timings for 2048K FFT length (8 cpus, 8 workers): 22.60, 23.32, 22.76, 22.78, 22.54, 22.61, 22.61, 22.54 ms. Throughput: 352.17 iter/sec. Timings for 2560K FFT length (8 cpus, 8 workers): 33.53, 34.97, 33.76, 34.34, 34.01, 33.93, 34.26, 33.98 ms. Throughput: 234.66 iter/sec. Timings for 3072K FFT length (8 cpus, 8 workers): 41.23, 42.38, 41.51, 40.71, 40.84, 40.78, 40.87, 41.04 ms. Throughput: 194.34 iter/sec. Timings for 3584K FFT length (8 cpus, 8 workers): 48.09, 49.43, 47.96, 48.77, 47.89, 47.32, 47.90, 47.23 ms. Throughput: 166.45 iter/sec. Timings for 4096K FFT length (8 cpus, 8 workers): 56.27, 57.15, 55.09, 55.39, 55.64, 54.99, 54.88, 54.69 ms. Throughput: 144.14 iter/sec. Timings for 5120K FFT length (8 cpus, 8 workers): 58.15, 60.30, 58.03, 57.82, 57.55, 57.00, 58.24, 57.01 ms. Throughput: 137.94 iter/sec. Timings for 6144K FFT length (8 cpus, 8 workers): 70.59, 72.77, 71.30, 71.76, 70.77, 70.67, 70.83, 70.63 ms. Throughput: 112.43 iter/sec. Timings for 7168K FFT length (8 cpus, 8 workers): 87.46, 87.18, 83.29, 83.81, 82.80, 83.61, 83.66, 83.11 ms. Throughput: 94.87 iter/sec. Timings for 8192K FFT length (8 cpus, 8 workers): 99.83, 99.12, 96.13, 97.41, 96.20, 96.03, 96.76, 96.01 ms. Throughput: 82.33 iter/sec. Last fiddled with by db597 on 2017-04-11 at 08:25 |
|
|
|
|
|
#745 |
|
Romulan Interpreter
Jun 2011
Thailand
3×3,221 Posts |
Well, not exactly the same price range, but for a comparison term: i7-6950X @ 3.00GHz (yes, underclocked, having momentarily problems with cooling, April is Thai summer, the hottest period of the year, ~45°C outside), with single worker, working on 8 cores (from 10), on the required FFT size, Prime95 64-bit version 28.10:
<snip> Timing FFTs using 8 threads on 8 physical CPUs. <snip> Best time for 8192K FFT length: 7.136 ms., avg: 7.291 ms. <snip> Last fiddled with by LaurV on 2017-04-12 at 14:10 |
|
|
|
|
|
#746 |
|
Jan 2003
20310 Posts |
@LaurV... thanks for the comparison benchmark.
So for the case of both systems running on 8 physical cores, it's 7.136ms for the i7-6950X @ 3.0GHz vs 12.69ms for the Ryzen 1700 @ 3.3GHz. Looks like Intel wins big in terms of IPC. Would still be interesting to see the results from a i7-7700K (half the cores, but higher IPC and higher clockspeed)... to compare at a similar cost level (a Ryzen 1700 system being still a bit cheaper than a comparable i7-7700K system). |
|
|
|
|
|
#747 |
|
"Kieren"
Jul 2011
In My Own Galaxy!
2·3·1,693 Posts |
I don't understand this read on CPU speed. It was, and is running at 4.20GHz.
RAM is at 3200MHz. Code:
[Wed Apr 12 22:03:33 2017]
Compare your results to other computers at http://www.mersenne.org/report_benchmarks
Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
CPU speed: 4008.14 MHz, 4 hyperthreaded cores
CPU features: Prefetchw, SSE, SSE2, SSE4, AVX, AVX2, FMA
L1 cache size: 32 KB
L2 cache size: 256 KB, L3 cache size: 8 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Machine topology as determined by hwloc library:
Machine#0 (total=12649168KB, Backend=Windows, hwlocVersion=1.11.6, ProcessName=prime95.exe)
NUMANode#0 (local=12649168KB, total=12649168KB)
Package#0 (CPUVendor=GenuineIntel, CPUFamilyNumber=6, CPUModelNumber=94, CPUModel="Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz", CPUStepping=3)
L3 (size=8192KB, linesize=64, ways=16, Inclusive=1)
L2 (size=256KB, linesize=64, ways=4, Inclusive=0)
L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
Core (cpuset: 0x00000003)
PU#0 (cpuset: 0x00000001)
PU#1 (cpuset: 0x00000002)
L2 (size=256KB, linesize=64, ways=4, Inclusive=0)
L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
Core (cpuset: 0x0000000c)
PU#2 (cpuset: 0x00000004)
PU#3 (cpuset: 0x00000008)
L2 (size=256KB, linesize=64, ways=4, Inclusive=0)
L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
Core (cpuset: 0x00000030)
PU#4 (cpuset: 0x00000010)
PU#5 (cpuset: 0x00000020)
L2 (size=256KB, linesize=64, ways=4, Inclusive=0)
L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
Core (cpuset: 0x000000c0)
PU#6 (cpuset: 0x00000040)
PU#7 (cpuset: 0x00000080)
Prime95 64-bit version 29.1, RdtscTiming=1
Timings for 1024K FFT length (1 cpu, 1 worker): 3.18 ms. Throughput: 314.28 iter/sec.
Timings for 1024K FFT length (2 cpus, 1 worker): 1.67 ms. Throughput: 599.56 iter/sec.
Timings for 1024K FFT length (3 cpus, 1 worker): 1.13 ms. Throughput: 888.71 iter/sec.
Timings for 1024K FFT length (4 cpus, 1 worker): 0.86 ms. Throughput: 1161.54 iter/sec.
Timings for 1280K FFT length (1 cpu, 1 worker): 4.04 ms. Throughput: 247.48 iter/sec.
Timings for 1280K FFT length (2 cpus, 1 worker): 2.09 ms. Throughput: 478.34 iter/sec.
Timings for 1280K FFT length (3 cpus, 1 worker): 1.44 ms. Throughput: 695.49 iter/sec.
Timings for 1280K FFT length (4 cpus, 1 worker): 1.11 ms. Throughput: 900.27 iter/sec.
Timings for 1536K FFT length (1 cpu, 1 worker): 4.89 ms. Throughput: 204.35 iter/sec.
Timings for 1536K FFT length (2 cpus, 1 worker): 2.54 ms. Throughput: 394.47 iter/sec.
Timings for 1536K FFT length (3 cpus, 1 worker): 1.73 ms. Throughput: 579.18 iter/sec.
Timings for 1536K FFT length (4 cpus, 1 worker): 1.38 ms. Throughput: 724.80 iter/sec.
Timings for 1792K FFT length (1 cpu, 1 worker): 6.14 ms. Throughput: 162.89 iter/sec.
Timings for 1792K FFT length (2 cpus, 1 worker): 3.24 ms. Throughput: 308.59 iter/sec.
Timings for 1792K FFT length (3 cpus, 1 worker): 2.17 ms. Throughput: 461.04 iter/sec.
Timings for 1792K FFT length (4 cpus, 1 worker): 1.70 ms. Throughput: 588.90 iter/sec.
Timings for 2048K FFT length (1 cpu, 1 worker): 6.52 ms. Throughput: 153.46 iter/sec.
Timings for 2048K FFT length (2 cpus, 1 worker): 3.41 ms. Throughput: 292.96 iter/sec.
Timings for 2048K FFT length (3 cpus, 1 worker): 2.36 ms. Throughput: 423.56 iter/sec.
Timings for 2048K FFT length (4 cpus, 1 worker): 1.94 ms. Throughput: 515.17 iter/sec.
Timings for 2560K FFT length (1 cpu, 1 worker): 8.59 ms. Throughput: 116.35 iter/sec.
Timings for 2560K FFT length (2 cpus, 1 worker): 4.50 ms. Throughput: 222.19 iter/sec.
Timings for 2560K FFT length (3 cpus, 1 worker): 3.05 ms. Throughput: 327.92 iter/sec.
Timings for 2560K FFT length (4 cpus, 1 worker): 2.45 ms. Throughput: 408.69 iter/sec.
Timings for 3072K FFT length (1 cpu, 1 worker): 10.24 ms. Throughput: 97.65 iter/sec.
Timings for 3072K FFT length (2 cpus, 1 worker): 5.27 ms. Throughput: 189.81 iter/sec.
Timings for 3072K FFT length (3 cpus, 1 worker): 3.62 ms. Throughput: 276.07 iter/sec.
Timings for 3072K FFT length (4 cpus, 1 worker): 2.95 ms. Throughput: 339.20 iter/sec.
[Wed Apr 12 22:08:44 2017]
Timings for 3584K FFT length (1 cpu, 1 worker): 12.36 ms. Throughput: 80.90 iter/sec.
Timings for 3584K FFT length (2 cpus, 1 worker): 6.34 ms. Throughput: 157.62 iter/sec.
Timings for 3584K FFT length (3 cpus, 1 worker): 4.33 ms. Throughput: 230.80 iter/sec.
Timings for 3584K FFT length (4 cpus, 1 worker): 3.53 ms. Throughput: 283.48 iter/sec.
Timings for 4096K FFT length (1 cpu, 1 worker): 14.18 ms. Throughput: 70.50 iter/sec.
Timings for 4096K FFT length (2 cpus, 1 worker): 7.33 ms. Throughput: 136.44 iter/sec.
Timings for 4096K FFT length (3 cpus, 1 worker): 5.01 ms. Throughput: 199.63 iter/sec.
Timings for 4096K FFT length (4 cpus, 1 worker): 4.07 ms. Throughput: 245.91 iter/sec.
|
|
|
|
|
|
#748 | |
|
"James Heinrich"
May 2004
ex-Northern Ontario
23·149 Posts |
Quote:
I would guess that Prime95 reads the processor frequency on startup before starting the actual benchmark, and turbo doesn't kick in until the CPU is under load. You could use your favourite monitoring utility (e.g. CPU-Z) to monitor CPU frequency in realtime and see how it changes as you start/run the benchmark. |
|
|
|
|
|
|
#749 | |
|
"Kieren"
Jul 2011
In My Own Galaxy!
2·3·1,693 Posts |
Quote:
I will have to see when it goes to max when starting the benchmark. This will include trying to limit other loads (like shutting down multi-tabbed Firefox,) and maybe even mfaktc to limit whatever memory contention might arise. Of course, too, limiting other loads is not a Real Life® situation, either. Most of the time I expect this machine to do a bunch of other stuff when I want it to, and this certainly takes a toll on the ms/it. Last fiddled with by kladner on 2017-04-13 at 04:10 |
|
|
|
|
|
|
#750 |
|
Jan 2003
CB16 Posts |
So from the benchmarks it looks like 8 Ryzen cores is still slower than 4 Skylake/Kabylake cores:
Ryzen @ 3.3GHz: Code:
Timings for 4096K FFT length (8 cpus, 1 worker): 6.92 ms. Throughput: 144.58 iter/sec. Code:
Timings for 4096K FFT length (4 cpus, 1 worker): 4.07 ms. Throughput: 245.91 iter/sec. |
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Perpetual "interesting video" thread... | Xyzzy | Lounge | 43 | 2021-07-17 00:00 |
| LLR benchmark thread | Oddball | Riesel Prime Search | 5 | 2010-08-02 00:11 |
| Perpetual I'm pi**ed off thread | rogue | Soap Box | 19 | 2009-10-28 19:17 |
| Perpetual autostereogram thread... | Xyzzy | Lounge | 10 | 2006-09-28 00:36 |
| Perpetual ECM factoring challenge thread... | Xyzzy | Factoring | 65 | 2005-09-05 08:16 |