mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
Thread Tools
Old 2016-07-28, 18:09   #701
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

3×5×313 Posts
Default And finally I assume they also must be same speed?

To interleave that is.

Or if I find the current 8GB stick is 1600 and I add a 2133 8Gb stick will it still interleave and get full dual-channel memory sharing?
petrw1 is offline   Reply With Quote
Old 2016-07-28, 18:37   #702
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

1101011000112 Posts
Default

Probably, but they'll both run at 1600, so you're wasting your money on faster RAM.
James Heinrich is offline   Reply With Quote
Old 2016-07-31, 17:36   #703
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

3·5·313 Posts
Default

Quote:
Originally Posted by petrw1 View Post
To interleave that is.

Or if I find the current 8GB stick is 1600 and I add a 2133 8Gb stick will it still interleave and get full dual-channel memory sharing?
He opened the box and found the RAM was DDR3 833 (is that even possible nowadays)

Anyway if the benchmark gave him 7.68 ms for 2048 FFT on 1 core (equivalent to other similar CPUs); actual speeds of 16ms for all 4 cores concurrent; can I expect it to be a lot faster, that is I'm thinking 2400 or even 2133 DDR3 may be overkill; more than I need

Thoughts....
petrw1 is offline   Reply With Quote
Old 2016-07-31, 18:09   #704
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

65438 Posts
Default

Quote:
Originally Posted by petrw1 View Post
found the RAM was DDR3 833 (is that even possible nowadays)
No?
I believe the spec starts at 800 and goes up in 266 increments (800, 1066, 1333, 1600, 1866, 2133, 2400, etc).
James Heinrich is offline   Reply With Quote
Old 2016-08-01, 03:45   #705
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

125716 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
No?
I believe the spec starts at 800 and goes up in 266 increments (800, 1066, 1333, 1600, 1866, 2133, 2400, etc).
I stand corrected....800 it is.


Same question though. Is it possible that could be capable of 7.68Ms times for 2048K FFT.

If so at what RAM speed do I no longer get improvements...is the MB or CPU, etc become the bottleneck.
petrw1 is offline   Reply With Quote
Old 2016-08-01, 04:09   #706
sdbardwick
 
sdbardwick's Avatar
 
Aug 2002
North San Diego County

68510 Posts
Default

Might be 1333 rather than 800; tiny "1" in close proximity to "3" looks like an "8".
sdbardwick is offline   Reply With Quote
Old 2016-08-01, 04:45   #707
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

1011011100102 Posts
Default

Well, a i7-6700 probably takes DDR4. The lowest common speed for DDR4 is 2133 MHz.

Get a stick of DDR4-2133 that matches the size of one of the existing sticks.
Mark Rose is offline   Reply With Quote
Old 2016-08-01, 09:10   #708
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

7·292 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
Well, a i7-6700 probably takes DDR4. The lowest common speed for DDR4 is 2133 MHz.

Get a stick of DDR4-2133 that matches the size of one of the existing sticks.
Assuming the motherboard does. It is unlikely that you would be able to replace ddr3 with ddr4. The sockets at least are different. There may be motherboards that support both.
henryzz is online now   Reply With Quote
Old 2016-08-26, 21:48   #709
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

3×5×313 Posts
Default Expert opinion?

2X8GB DDR3 - 1866 and 2400 both under $100 ... what's that catch?

http://www.newegg.ca/Product/Product...82E16820104467

http://www.newegg.ca/Product/Product...82E16820233585
petrw1 is offline   Reply With Quote
Old 2016-08-26, 22:19   #710
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

23×149 Posts
Default

11-13-13 vs 10-11-10 CAS timings
James Heinrich is offline   Reply With Quote
Old 2016-08-26, 22:23   #711
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

469510 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
11-13-13 vs 10-11-10 CAS timings
Yeah I noticed bit it didn't seem like a lot.
Does that CAS difference make up for the speed difference?
In other words which is likely to perform better for GIMPS?

Last fiddled with by petrw1 on 2016-08-26 at 22:23
petrw1 is offline   Reply With Quote
Old 2016-08-26, 22:26   #712
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

23×149 Posts
Default

If your board can actually support DDR3-2400 then you're probably better with CAS11@2400 than CAS10@1866, but whether it will make any appreciable difference I can't say. I'm sure others can.
James Heinrich is offline   Reply With Quote
Old 2016-08-26, 22:47   #713
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

10010010101112 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
If your board can actually support DDR3-2400 then you're probably better with CAS11@2400 than CAS10@1866, but whether it will make any appreciable difference I can't say. I'm sure others can.
Can anyone easily tell me if this will support 2400 DDR3?

Acer Aspire T3-710 V:1.1
petrw1 is offline   Reply With Quote
Old 2016-08-26, 22:58   #714
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

2×5×293 Posts
Default

Quote:
Originally Posted by petrw1 View Post
Can anyone easily tell me if this will support 2400 DDR3?

Acer Aspire T3-710 V:1.1
This machine? http://www.acer.com/ac/en/SG/content/model/DT.B1HSG.001

It takes DDR3L-1600 at 1.35V.

The memory at both those links won't work. Try this:

http://www.newegg.ca/Product/Product...82E16820156047
Mark Rose is offline   Reply With Quote
Old 2016-08-26, 23:00   #715
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

D6316 Posts
Default

Quote:
Originally Posted by petrw1 View Post
Acer Aspire
As a generalization: Probably not. "Brand name" systems tend to be equipped with parts that are just sufficient to run at their specified settings and no more. Greater abilities mean greater cost, and Acer's not going to pay for 2400MHz RAM support when they're only planning on putting 1333 (or whatever stock is) in there. If you didn't pick the motherboard and build the system yourself (or have your friend/store building it for you) then I wouldn't expect anything above 1600 (if that) to be supported on any brand-name system.

Last fiddled with by James Heinrich on 2016-08-26 at 23:01
James Heinrich is offline   Reply With Quote
Old 2016-08-27, 00:59   #716
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

2×5×293 Posts
Default

i5-6600 with DDR-2133:

[Work thread Aug 24 10:24] Benchmarking multiple workers to measure the impact of memory bandwidth
[Work thread Aug 24 10:27] Timing 2048K FFT, 4 cpus, 1 worker. Average times: 2.57 ms. Total throughput: 388.96 iter/sec.
[Work thread Aug 24 10:27] Timing 2048K FFT, 4 cpus, 2 workers. Average times: 5.30, 5.30 ms. Total throughput: 377.27 iter/sec.
[Work thread Aug 24 10:27] Timing 2048K FFT, 4 cpus, 4 workers. Average times: 10.71, 10.71, 10.71, 10.71 ms. Total throughput: 373.41 iter/sec.
...
[Work thread Aug 24 10:29] Timing 4096K FFT, 4 cpus, 1 worker. Average times: 5.40 ms. Total throughput: 185.15 iter/sec.
[Work thread Aug 24 10:29] Timing 4096K FFT, 4 cpus, 2 workers. Average times: 10.88, 10.78 ms. Total throughput: 184.67 iter/sec.
[Work thread Aug 24 10:29] Timing 4096K FFT, 4 cpus, 4 workers. Average times: 21.67, 21.70, 21.65, 21.64 ms. Total throughput: 184.64 iter/sec.

Stock-clocked i7-4770k with DDR3-2400:

[Work thread Aug 24 10:30] Timing 2048K FFT, 4 cpus, 1 worker. Average times: 2.63 ms. Total throughput: 380.55 iter/sec.
[Work thread Aug 24 10:30] Timing 2048K FFT, 4 cpus, 2 workers. Average times: 5.90, 5.58 ms. Total throughput: 348.60 iter/sec.
[Work thread Aug 24 10:30] Timing 2048K FFT, 4 cpus, 4 workers. Average times: 10.49, 10.42, 10.54, 10.42 ms. Total throughput: 382.13 iter/sec.
[Work thread Aug 24 10:31] Timing 2560K FFT, 4 cpus, 1 worker. Average times: 3.56 ms. Total throughput: 280.91 iter/sec.
[Work thread Aug 24 10:31] Timing 2560K FFT, 4 cpus, 2 workers. Average times: 6.81, 6.80 ms. Total throughput: 294.08 iter/sec.
[Work thread Aug 24 10:31] Timing 2560K FFT, 4 cpus, 4 workers. Average times: 13.53, 13.56, 13.67, 13.58 ms. Total throughput: 294.46 iter/sec.
[Work thread Aug 24 10:31] Timing 3072K FFT, 4 cpus, 1 worker. Average times: 4.36 ms. Total throughput: 229.51 iter/sec.
[Work thread Aug 24 10:31] Timing 3072K FFT, 4 cpus, 2 workers. Average times: 8.07, 8.09 ms. Total throughput: 247.56 iter/sec.
[Work thread Aug 24 10:32] Timing 3072K FFT, 4 cpus, 4 workers. Average times: 16.12, 16.11, 16.25, 16.13 ms. Total throughput: 247.63 iter/sec.
[Work thread Aug 24 10:32] Timing 3584K FFT, 4 cpus, 1 worker. Average times: 4.81 ms. Total throughput: 207.94 iter/sec.
[Work thread Aug 24 10:32] Timing 3584K FFT, 4 cpus, 2 workers. Average times: 9.69, 9.66 ms. Total throughput: 206.67 iter/sec.
[Work thread Aug 24 10:32] Timing 3584K FFT, 4 cpus, 4 workers. Average times: 19.07, 18.99, 18.99, 19.09 ms. Total throughput: 210.14 iter/sec.
[Work thread Aug 24 10:32] Timing 4096K FFT, 4 cpus, 1 worker. Average times: 5.51 ms. Total throughput: 181.46 iter/sec.
[Work thread Aug 24 10:33] Timing 4096K FFT, 4 cpus, 2 workers. Average times: 10.85, 10.83 ms. Total throughput: 184.52 iter/sec.
[Work thread Aug 24 10:33] Timing 4096K FFT, 4 cpus, 4 workers. Average times: 21.96, 21.98, 21.76, 21.81 ms. Total throughput: 182.84 iter/sec.

And a 4770 with DDR3-1600:

[Work thread Aug 26 20:48] Timing 1024K FFT, 4 cpus, 1 worker. Average times: 1.31 ms. Total throughput: 760.95 iter/sec.
[Work thread Aug 26 20:48] Timing 1024K FFT, 4 cpus, 2 workers. Average times: 3.30, 3.70 ms. Total throughput: 573.21 iter/sec.
[Work thread Aug 26 20:48] Timing 1024K FFT, 4 cpus, 4 workers. Average times: 7.04, 7.16, 7.98, 7.27 ms. Total throughput: 544.37 iter/sec.
[Work thread Aug 26 20:48] Timing 1280K FFT, 4 cpus, 1 worker. Average times: 1.99 ms. Total throughput: 502.73 iter/sec.
[Work thread Aug 26 20:48] Timing 1280K FFT, 4 cpus, 2 workers. Average times: 4.38, 4.43 ms. Total throughput: 453.94 iter/sec.
[Work thread Aug 26 20:48] Timing 1280K FFT, 4 cpus, 4 workers. Average times: 9.18, 9.07, 9.10, 9.17 ms. Total throughput: 438.16 iter/sec.
[Work thread Aug 26 20:49] Timing 1536K FFT, 4 cpus, 1 worker. Average times: 2.43 ms. Total throughput: 411.77 iter/sec.
[Work thread Aug 26 20:49] Timing 1536K FFT, 4 cpus, 2 workers. Average times: 5.29, 5.44 ms. Total throughput: 372.94 iter/sec.
[Work thread Aug 26 20:49] Timing 1536K FFT, 4 cpus, 4 workers. Average times: 10.91, 10.94, 11.06, 11.04 ms. Total throughput: 363.99 iter/sec.
[Work thread Aug 26 20:49] Timing 1792K FFT, 4 cpus, 1 worker. Average times: 3.06 ms. Total throughput: 326.68 iter/sec.
[Work thread Aug 26 20:49] Timing 1792K FFT, 4 cpus, 2 workers. Average times: 6.55, 6.56 ms. Total throughput: 305.11 iter/sec.
[Work thread Aug 26 20:50] Timing 1792K FFT, 4 cpus, 4 workers. Average times: 13.18, 13.22, 13.06, 13.43 ms. Total throughput: 302.52 iter/sec.
[Work thread Aug 26 20:50] Timing 2048K FFT, 4 cpus, 1 worker. Average times: 3.49 ms. Total throughput: 286.67 iter/sec.
[Work thread Aug 26 20:50] Timing 2048K FFT, 4 cpus, 2 workers. Average times: 8.08, 7.51 ms. Total throughput: 256.88 iter/sec.
[Work thread Aug 26 20:50] Timing 2048K FFT, 4 cpus, 4 workers. Average times: 14.76, 15.04, 14.82, 15.09 ms. Total throughput: 267.96 iter/sec.
[Work thread Aug 26 20:50] Timing 2560K FFT, 4 cpus, 1 worker. Average times: 4.63 ms. Total throughput: 216.13 iter/sec.
[Work thread Aug 26 20:51] Timing 2560K FFT, 4 cpus, 2 workers. Average times: 9.50, 9.62 ms. Total throughput: 209.26 iter/sec.
[Work thread Aug 26 20:51] Timing 2560K FFT, 4 cpus, 4 workers. Average times: 18.46, 19.41, 19.10, 18.90 ms. Total throughput: 210.98 iter/sec.
[Work thread Aug 26 20:51] Timing 3072K FFT, 4 cpus, 1 worker. Average times: 5.66 ms. Total throughput: 176.75 iter/sec.
[Work thread Aug 26 20:51] Timing 3072K FFT, 4 cpus, 2 workers. Average times: 11.60, 11.86 ms. Total throughput: 170.48 iter/sec.
[Work thread Aug 26 20:51] Timing 3072K FFT, 4 cpus, 4 workers. Average times: 23.17, 22.32, 22.55, 23.19 ms. Total throughput: 175.43 iter/sec.
[Work thread Aug 26 20:52] Timing 3584K FFT, 4 cpus, 1 worker. Average times: 6.68 ms. Total throughput: 149.66 iter/sec.
[Work thread Aug 26 20:52] Timing 3584K FFT, 4 cpus, 2 workers. Average times: 13.62, 13.55 ms. Total throughput: 147.24 iter/sec.
[Work thread Aug 26 20:52] Timing 3584K FFT, 4 cpus, 4 workers. Average times: 26.65, 26.95, 27.47, 27.59 ms. Total throughput: 147.27 iter/sec.
[Work thread Aug 26 20:52] Timing 4096K FFT, 4 cpus, 1 worker. Average times: 8.05 ms. Total throughput: 124.17 iter/sec.
[Work thread Aug 26 20:52] Timing 4096K FFT, 4 cpus, 2 workers. Average times: 15.30, 15.29 ms. Total throughput: 130.74 iter/sec.
[Work thread Aug 26 20:53] Timing 4096K FFT, 4 cpus, 4 workers. Average times: 30.15, 30.57, 30.68, 30.75 ms. Total throughput: 131.00 iter/sec.
Mark Rose is offline   Reply With Quote
Old 2016-08-31, 16:53   #717
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

469510 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
This machine? http://www.acer.com/ac/en/SG/content/model/DT.B1HSG.001

It takes DDR3L-1600 at 1.35V.

The memory at both those links won't work. Try this:

http://www.newegg.ca/Product/Product...82E16820156047
I really appreciate the help (from all of you) but what in the first link tells me it can only handle DDR3-1600 at 1.35V?
petrw1 is offline   Reply With Quote
Old 2016-08-31, 17:25   #718
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

55628 Posts
Default

Quote:
Originally Posted by petrw1 View Post
I really appreciate the help (from all of you) but what in the first link tells me it can only handle DDR3-1600 at 1.35V?
It's a specification of the CPU. http://ark.intel.com/products/88185/...up-to-3_30-GHz

When it's being used with DDR3, which is what the Acer page says the motherboard takes, it's limited to 1600 MHz and it must be low voltage DDR3, or DDR3L.
Mark Rose is offline   Reply With Quote
Old 2016-09-12, 04:47   #719
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

3·5·313 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
It's a specification of the CPU. http://ark.intel.com/products/88185/...up-to-3_30-GHz

When it's being used with DDR3, which is what the Acer page says the motherboard takes, it's limited to 1600 MHz and it must be low voltage DDR3, or DDR3L.
So we ordered and installed 2X8G DDR3 - 1600 but CPU-Z says it is running at 800. Is it a simple MB-Settings thing or are we SOL.

The before and after benchmarks are virtually the same but 4 cores doing DC-LL is about 12% faster due to balanced Dual RAM.
petrw1 is offline   Reply With Quote
Old 2016-09-12, 13:16   #720
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

7×292 Posts
Default

Quote:
Originally Posted by petrw1 View Post
So we ordered and installed 2X8G DDR3 - 1600 but CPU-Z says it is running at 800. Is it a simple MB-Settings thing or are we SOL.

The before and after benchmarks are virtually the same but 4 cores doing DC-LL is about 12% faster due to balanced Dual RAM.
Cpu-z usually reports half the speed.
henryzz is online now   Reply With Quote
Old 2016-09-12, 16:18   #721
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

469510 Posts
Default

Quote:
Originally Posted by petrw1 View Post
So we ordered and installed 2X8G DDR3 - 1600 but CPU-Z says it is running at 800. Is it a simple MB-Settings thing or are we SOL.

The before and after benchmarks are virtually the same but 4 cores doing DC-LL is about 12% faster due to balanced Dual RAM.
Correction 25% faster.

17 to under 13 ms on a 37.5M DC for all 4 cores

Benchmark says it should be 13.5 for a 2048FFT with 4 cores....seems he is doing a little better
(7.68 for 1 core alone)

Last fiddled with by petrw1 on 2016-09-12 at 16:24
petrw1 is offline   Reply With Quote
Old 2016-09-12, 16:46   #722
Antonio
 
Antonio's Avatar
 
"Antonio Key"
Sep 2011
UK

32·59 Posts
Default

Quote:
Originally Posted by henryzz View Post
Cpu-z usually reports half the speed.
Cpu-z gives the memory clock speed correctly. The factor of 2 difference comes from DDR memory transferring data on both the rising and the falling edge of the clock.
Antonio is offline   Reply With Quote
Old 2016-09-12, 17:22   #723
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

3×5×313 Posts
Default

Quote:
Originally Posted by Antonio View Post
Cpu-z gives the memory clock speed correctly. The factor of 2 difference comes from DDR memory transferring data on both the rising and the falling edge of the clock.
So to clarify if CPU-Z says 800 then for some reason my RAM is running at 800 and not 1600 as it is capable of?

If so is there anything I can do to get to that speed?
Or will that make no difference, that is, will the MB or some other component limit me to 800 anyway?
petrw1 is offline   Reply With Quote
Old 2016-09-12, 17:27   #724
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2·3·1,693 Posts
Default

Quote:
Originally Posted by petrw1 View Post
So to clarify if CPU-Z says 800 then for some reason my RAM is running at 800 and not 1600 as it is capable of?

If so is there anything I can do to get to that speed?
Or will that make no difference, that is, will the MB or some other component limit me to 800 anyway?
Your RAM is running at the correct speed. As explained, DDR RAM gets two operations per clock cycle. CPUZ reports the base clock, not what the RAM is doing.
kladner is offline   Reply With Quote
Old 2016-09-13, 03:40   #725
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

342710 Posts
Default

https://en.wikipedia.org/wiki/Double_data_rate

The RAM is running on an 800MHz (million cycles per second) clock, and transferring data at 1600MT/s (million transactions per second). Both are correct, even if that is confusing.
James Heinrich is offline   Reply With Quote
Old 2016-11-16, 00:00   #726
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009

111101010112 Posts
Default

Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz
CPU speed: 3557.21 MHz, 4 cores
CPU features: 3DNow!, SSE, SSE2, SSE4, AVX
L1 cache size: 32 KB
L2 cache size: 256 KB, L3 cache size: 6 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Prime95 64-bit version 28.10, RdtscTiming=1

Timing FFTs using 1 thread.
Best time for 1024K FFT length: 9.508 ms., avg: 9.768 ms.
Best time for 1280K FFT length: 12.303 ms., avg: 12.418 ms.
Best time for 1536K FFT length: 14.999 ms., avg: 15.142 ms.
Best time for 1792K FFT length: 18.287 ms., avg: 18.356 ms.
Best time for 2048K FFT length: 20.227 ms., avg: 20.361 ms.
Best time for 2560K FFT length: 26.262 ms., avg: 26.380 ms.
Best time for 3072K FFT length: 31.668 ms., avg: 31.762 ms.
Best time for 3584K FFT length: 38.144 ms., avg: 38.364 ms.
Best time for 4096K FFT length: 42.237 ms., avg: 42.404 ms.
Best time for 5120K FFT length: 54.871 ms., avg: 54.996 ms.
Best time for 6144K FFT length: 68.655 ms., avg: 68.826 ms.
Best time for 7168K FFT length: 82.420 ms., avg: 82.663 ms.
Best time for 8192K FFT length: 90.886 ms., avg: 91.456 ms.

Timing FFTs using 2 threads.
Best time for 1024K FFT length: 4.864 ms., avg: 4.918 ms.
Best time for 1280K FFT length: 6.314 ms., avg: 6.388 ms.
Best time for 1536K FFT length: 7.647 ms., avg: 7.741 ms.
Best time for 1792K FFT length: 9.385 ms., avg: 9.449 ms.
Best time for 2048K FFT length: 10.340 ms., avg: 10.423 ms.
Best time for 2560K FFT length: 13.370 ms., avg: 13.465 ms.
Best time for 3072K FFT length: 16.091 ms., avg: 16.292 ms.
Best time for 3584K FFT length: 19.393 ms., avg: 19.624 ms.
Best time for 4096K FFT length: 21.453 ms., avg: 21.588 ms.
Best time for 5120K FFT length: 27.850 ms., avg: 28.476 ms.
Best time for 6144K FFT length: 34.854 ms., avg: 35.100 ms.
Best time for 7168K FFT length: 41.837 ms., avg: 42.006 ms.
Best time for 8192K FFT length: 46.029 ms., avg: 46.188 ms.

Timing FFTs using 3 threads.
Best time for 1024K FFT length: 3.412 ms., avg: 3.462 ms.
Best time for 1280K FFT length: 4.457 ms., avg: 4.533 ms.
Best time for 1536K FFT length: 5.287 ms., avg: 5.401 ms.
Best time for 1792K FFT length: 6.556 ms., avg: 6.645 ms.
Best time for 2048K FFT length: 7.277 ms., avg: 7.350 ms.
Best time for 2560K FFT length: 9.316 ms., avg: 9.495 ms.
Best time for 3072K FFT length: 11.275 ms., avg: 11.354 ms.
Best time for 3584K FFT length: 13.431 ms., avg: 13.660 ms.
Best time for 4096K FFT length: 14.977 ms., avg: 15.127 ms.
Best time for 5120K FFT length: 19.226 ms., avg: 19.463 ms.
Best time for 6144K FFT length: 24.403 ms., avg: 24.689 ms.
Best time for 7168K FFT length: 28.934 ms., avg: 29.159 ms.
Best time for 8192K FFT length: 31.774 ms., avg: 32.199 ms.

Timing FFTs using 4 threads.
Best time for 1024K FFT length: 2.730 ms., avg: 2.804 ms.
Best time for 1280K FFT length: 3.577 ms., avg: 3.688 ms.
Best time for 1536K FFT length: 4.234 ms., avg: 4.320 ms.
Best time for 1792K FFT length: 5.218 ms., avg: 5.423 ms.
Best time for 2048K FFT length: 5.985 ms., avg: 6.203 ms.
Best time for 2560K FFT length: 7.661 ms., avg: 7.855 ms.
Best time for 3072K FFT length: 9.270 ms., avg: 9.394 ms.
Best time for 3584K FFT length: 11.039 ms., avg: 11.281 ms.
Best time for 4096K FFT length: 12.421 ms., avg: 12.651 ms.
Best time for 5120K FFT length: 15.811 ms., avg: 15.986 ms.
Best time for 6144K FFT length: 20.161 ms., avg: 20.647 ms.
Best time for 7168K FFT length: 23.814 ms., avg: 24.115 ms.
Best time for 8192K FFT length: 26.010 ms., avg: 26.269 ms.

Timings for 1024K FFT length (4 cpus, 4 workers): 11.17, 11.15, 11.15, 11.16 ms. Throughput: 358.47 iter/sec.
Timings for 1280K FFT length (4 cpus, 4 workers): 14.62, 14.55, 14.55, 14.57 ms. Throughput: 274.43 iter/sec.
Timings for 1536K FFT length (4 cpus, 4 workers): 16.92, 16.81, 16.74, 16.74 ms. Throughput: 238.06 iter/sec.
Timings for 1792K FFT length (4 cpus, 4 workers): 20.66, 20.37, 20.73, 20.36 ms. Throughput: 194.87 iter/sec.
Timings for 2048K FFT length (4 cpus, 4 workers): 26.05, 25.39, 26.14, 26.14 ms. Throughput: 154.29 iter/sec.
Timings for 2560K FFT length (4 cpus, 4 workers): 28.62, 28.13, 28.07, 28.23 ms. Throughput: 141.54 iter/sec.
Timings for 3072K FFT length (4 cpus, 4 workers): 37.11, 38.15, 36.84, 37.20 ms. Throughput: 107.18 iter/sec.
Timings for 3584K FFT length (4 cpus, 4 workers): 42.48, 42.02, 41.94, 42.10 ms. Throughput: 94.94 iter/sec.
Timings for 4096K FFT length (4 cpus, 4 workers): 47.66, 47.15, 46.92, 46.69 ms. Throughput: 84.92 iter/sec.
Timings for 5120K FFT length (4 cpus, 4 workers): 61.24, 60.11, 59.79, 60.09 ms. Throughput: 66.33 iter/sec.
Timings for 6144K FFT length (4 cpus, 4 workers): 75.05, 74.09, 73.57, 74.10 ms. Throughput: 53.91 iter/sec.
Timings for 7168K FFT length (4 cpus, 4 workers): 92.21, 90.88, 90.51, 91.44 ms. Throughput: 43.83 iter/sec.
Timings for 8192K FFT length (4 cpus, 4 workers): 106.61, 108.07, 111.84, 106.36 ms. Throughput: 36.98 iter/sec.
storm5510 is offline   Reply With Quote
Old 2016-12-26, 06:01   #727
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

2×5×293 Posts
Default

I wanted to see how changing the CPU frequency would affect the power consumption of my i5-6600 DDR4-2133 compute cluster. I created a spreadsheet with timings of 4 threads and 4 workers.

We've noticed before Skylake gets more throughput using 4 threads instead of 4 workers, but it turns out that changes with CPU frequency. When I underclock the CPU to 3.3 GHz, using 4 workers begins to be more performant.

Spreadsheet.

Underclocking is proving to be an efficiency win. I haven't yet played with undervolting.

I also intend to gather more data while running 3 threads/workers. So far 4 threads/workers at 3.3 GHz is faster than 3 threads at 3.7 GHz. I haven't yet disabled a core in the BIOS to check power consumption differences. I strongly suspect 4 cores will win.
Mark Rose is offline   Reply With Quote
Old 2016-12-26, 09:07   #728
S485122
 
S485122's Avatar
 
"Jacob"
Sep 2006
Brussels, Belgium

2·32·5·19 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
...
Underclocking is proving to be an efficiency win. I haven't yet played with undervolting.
...
This is why the low voltage versions of those CPU are interesting : the i5-6600T for instance.

Jacob
S485122 is offline   Reply With Quote
Old 2016-12-26, 13:43   #729
axn
 
axn's Avatar
 
Jun 2003

5,087 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
We've noticed before Skylake gets more throughput using 4 threads instead of 4 workers, but it turns out that changes with CPU frequency. When I underclock the CPU to 3.3 GHz, using 4 workers begins to be more performant.
Based on the data, I think this statement is too strong. Even at 3.3 GHz, 1 worker is more performant that 4 workers, except for 3 FFTs where the difference is so small as to be negligible. Of course, the trend is there, so maybe 3.2 & below might show a clear superiority for the 4 workers.
axn is offline   Reply With Quote
Old 2016-12-26, 14:03   #730
lavalamp
 
lavalamp's Avatar
 
Oct 2007
Manchester, UK

23×59 Posts
Default

Of course, the real metric should be performance / total cost of ownership.

Add up the cost of the computer, as well as the cost of all previously used, and expected future use of electricity (based on chosen clock speed). Optionally also include the cost of your time working on setting up and configuring the machine. :)
lavalamp is offline   Reply With Quote
Old 2016-12-26, 14:28   #731
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

22×32×173 Posts
Default

Quote:
Originally Posted by lavalamp View Post
Add up the cost of the computer, as well as the cost of all previously used, and expected future use of electricity (based on chosen clock speed). Optionally also include the cost of your time working on setting up and configuring the machine. :)
And if you lose one week of work trying to figure out the sweet spot and gain 3% then you have to work 24/7 for 32 weeks just to break even.
retina is online now   Reply With Quote
Old 2016-12-26, 19:30   #732
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

2·5·293 Posts
Default

Quote:
Originally Posted by axn View Post
Based on the data, I think this statement is too strong. Even at 3.3 GHz, 1 worker is more performant that 4 workers, except for 3 FFTs where the difference is so small as to be negligible. Of course, the trend is there, so maybe 3.2 & below might show a clear superiority for the 4 workers.
If you plot the slopes of iter/sec/ghz, the threads slope is steeper. I'll take more measurements and update the thread when done.

Most impressive to me so far is that I can run at 3.3 GHz for 95% performance for 86% of running costs, before undervolting.

Quote:
Originally Posted by lavalamp View Post
Of course, the real metric should be performance / total cost of ownership.

Add up the cost of the computer, as well as the cost of all previously used, and expected future use of electricity (based on chosen clock speed). Optionally also include the cost of your time working on setting up and configuring the machine. :)
It would have been much more cost efficient to go with the i5-6400, but I am hoping the i5-6600 will have better resale value in the future. Likewise I went with 32 GB of RAM. The additional $100/node also gives me more flexibility if I want to use the cluster for anything else. The four nodes share a single power supply.

Quote:
Originally Posted by retina View Post
And if you lose one week of work trying to figure out the sweet spot and gain 3% then you have to work 24/7 for 32 weeks just to break even.
It would be hard to lose a week of work. I'm only experimenting on a single node, and when I am measuring the power draw I'm doing DC. Testing stability with undervolting will certainly take more time!
Mark Rose is offline   Reply With Quote
Old 2016-12-26, 23:28   #733
lavalamp
 
lavalamp's Avatar
 
Oct 2007
Manchester, UK

23×59 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
It would have been much more cost efficient to go with the i5-6400, but I am hoping the i5-6600 will have better resale value in the future.
Unless you plan on selling in the near-future, I would have thought the resale value on these would be essentially nil. Intel will release the 7xxx series early next year, and then (at least they claim) the 8xxx series later in the year, meaning you'll already be 2 generations behind.

How long do you plan on running these machines, and what do you project an extra ~10W / machine will cost over such a time?

Edit: A rough calculation for 4 machines. 10W / machine is ~1kWh / day, and based on these figures electricity costs 11.16 c/kWh in Toronto now, but I'll round it up to 12 c/kWh. Over 4 years that comes to $175.

Last fiddled with by lavalamp on 2016-12-26 at 23:35
lavalamp is offline   Reply With Quote
Old 2016-12-27, 02:58   #734
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

B7216 Posts
Default

Unfortunately I wasn't able to underclock my CPU. I didn't know the fixed multiplier also prevents underclocking. I have yet to try undervolting.

Quote:
Originally Posted by lavalamp View Post
Unless you plan on selling in the near-future, I would have thought the resale value on these would be essentially nil. Intel will release the 7xxx series early next year, and then (at least they claim) the 8xxx series later in the year, meaning you'll already be 2 generations behind.
There is a vibrant used component market here. I intend to sell the whole lot to a used system builder. Of greater concern with selling is what AMD Ryzen will do.

Quote:
How long do you plan on running these machines, and what do you project an extra ~10W / machine will cost over such a time?

Edit: A rough calculation for 4 machines. 10W / machine is ~1kWh / day, and based on these figures electricity costs 11.16 c/kWh in Toronto now, but I'll round it up to 12 c/kWh. Over 4 years that comes to $175.
I wish electricity were so cheap. The actual electricity cost, once all the taxes, fees, etc., are factored in, is 17.7¢/kWh for the first 600 kWh and 19.5¢/kWh after. In the cooler months when not using AC, I've been using about 550 kWh with the cluster running. This past summer my highest monthly usage was about 1250 kWh. For simplicity I'll say the cluster incrementally costs me 18.5¢/kWh.

The cluster, not including the Ethernet switch, consumes 370 watts when running at stock clocks. That's 266 kWh/month, or $591/year. So if I can save 50 watts across the cluster, that's 36 kWh/month, or $80/year. I figure I'll run the cluster at most 2.5 years more.
Mark Rose is offline   Reply With Quote
Old 2017-02-14, 08:34   #735
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

2·5·293 Posts
Default

Single Rank versus Dual Rank DDR3

I now have two i3-4170 systems, both running 2x4 GB of DDR3-1600 RAM in dual channel configuration.

Dual rank:

[Work thread Feb 14 03:20] Timing 46 iterations of 1024K FFT length. Best time: 4.355 ms., avg time: 4.413 ms.
[Work thread Feb 14 03:20] Timing 36 iterations of 1280K FFT length. Best time: 5.554 ms., avg time: 5.562 ms.
[Work thread Feb 14 03:20] Timing 30 iterations of 1536K FFT length. Best time: 6.644 ms., avg time: 6.651 ms.
[Work thread Feb 14 03:20] Timing 26 iterations of 1792K FFT length. Best time: 8.000 ms., avg time: 8.007 ms.
[Work thread Feb 14 03:20] Timing 25 iterations of 2048K FFT length. Best time: 9.072 ms., avg time: 9.084 ms.
[Work thread Feb 14 03:20] Timing 25 iterations of 2560K FFT length. Best time: 11.528 ms., avg time: 11.545 ms.
[Work thread Feb 14 03:20] Timing 25 iterations of 3072K FFT length. Best time: 13.875 ms., avg time: 13.886 ms.
[Work thread Feb 14 03:20] Timing 25 iterations of 3584K FFT length. Best time: 16.453 ms., avg time: 16.466 ms.
[Work thread Feb 14 03:20] Timing 25 iterations of 4096K FFT length. Best time: 18.976 ms., avg time: 18.991 ms.
[Work thread Feb 14 03:20] Timing FFTs using 2 threads on 2 physical CPUs.
[Work thread Feb 14 03:20] Timing 46 iterations of 1024K FFT length. Best time: 2.401 ms., avg time: 2.417 ms.
[Work thread Feb 14 03:20] Timing 36 iterations of 1280K FFT length. Best time: 3.047 ms., avg time: 3.114 ms.
[Work thread Feb 14 03:20] Timing 30 iterations of 1536K FFT length. Best time: 3.651 ms., avg time: 3.732 ms.
[Work thread Feb 14 03:20] Timing 26 iterations of 1792K FFT length. Best time: 5.132 ms., avg time: 5.152 ms.
[Work thread Feb 14 03:20] Timing 25 iterations of 2048K FFT length. Best time: 5.079 ms., avg time: 5.102 ms.
[Work thread Feb 14 03:20] Timing 25 iterations of 2560K FFT length. Best time: 6.309 ms., avg time: 6.408 ms.
[Work thread Feb 14 03:20] Timing 25 iterations of 3072K FFT length. Best time: 7.651 ms., avg time: 7.779 ms.
[Work thread Feb 14 03:20] Timing 25 iterations of 3584K FFT length. Best time: 9.211 ms., avg time: 9.278 ms.
[Work thread Feb 14 03:20] Timing 25 iterations of 4096K FFT length. Best time: 10.455 ms., avg time: 10.505 ms.

[Work thread Feb 14 03:20] Benchmarking multiple workers to measure the impact of memory bandwidth
[Work thread Feb 14 03:20] Timing 1024K FFT, 2 cpus, 1 worker. Average times: 2.41 ms. Total throughput: 414.57 iter/sec.
[Work thread Feb 14 03:21] Timing 1024K FFT, 2 cpus, 2 workers. Average times: 4.79, 4.77 ms. Total throughput: 418.37 iter/sec.
[Work thread Feb 14 03:21] Timing 1280K FFT, 2 cpus, 1 worker. Average times: 3.10 ms. Total throughput: 323.10 iter/sec.
[Work thread Feb 14 03:21] Timing 1280K FFT, 2 cpus, 2 workers. Average times: 6.29, 6.00 ms. Total throughput: 325.74 iter/sec.
[Work thread Feb 14 03:22] Timing 1536K FFT, 2 cpus, 1 worker. Average times: 3.67 ms. Total throughput: 272.60 iter/sec.
[Work thread Feb 14 03:22] Timing 1536K FFT, 2 cpus, 2 workers. Average times: 7.20, 7.20 ms. Total throughput: 277.67 iter/sec.
[Work thread Feb 14 03:23] Timing 1792K FFT, 2 cpus, 1 worker. Average times: 4.43 ms. Total throughput: 225.57 iter/sec.
[Work thread Feb 14 03:23] Timing 1792K FFT, 2 cpus, 2 workers. Average times: 9.24, 8.61 ms. Total throughput: 224.36 iter/sec.
[Work thread Feb 14 03:23] Timing 2048K FFT, 2 cpus, 1 worker. Average times: 5.05 ms. Total throughput: 198.00 iter/sec.
[Work thread Feb 14 03:23] Timing 2048K FFT, 2 cpus, 2 workers. Average times: 10.26, 9.79 ms. Total throughput: 199.62 iter/sec.
[Work thread Feb 14 03:24] Timing 2560K FFT, 2 cpus, 1 worker. Average times: 6.31 ms. Total throughput: 158.43 iter/sec.
[Work thread Feb 14 03:24] Timing 2560K FFT, 2 cpus, 2 workers. Average times: 12.51, 12.51 ms. Total throughput: 159.83 iter/sec.
[Work thread Feb 14 03:25] Timing 3072K FFT, 2 cpus, 1 worker. Average times: 7.88 ms. Total throughput: 126.94 iter/sec.
[Work thread Feb 14 03:25] Timing 3072K FFT, 2 cpus, 2 workers. Average times: 16.51, 15.39 ms. Total throughput: 125.51 iter/sec.
[Work thread Feb 14 03:25] Timing 3584K FFT, 2 cpus, 1 worker. Average times: 9.20 ms. Total throughput: 108.75 iter/sec.
[Work thread Feb 14 03:26] Timing 3584K FFT, 2 cpus, 2 workers. Average times: 18.26, 17.89 ms. Total throughput: 110.63 iter/sec.
[Work thread Feb 14 03:26] Timing 4096K FFT, 2 cpus, 1 worker. Average times: 10.55 ms. Total throughput: 94.78 iter/sec.
[Work thread Feb 14 03:26] Timing 4096K FFT, 2 cpus, 2 workers. Average times: 20.70, 20.70 ms. Total throughput: 96.61 iter/sec.

Single rank:

[Work thread Feb 14 03:20] Timing 46 iterations of 1024K FFT length. Best time: 4.549 ms., avg time: 4.560 ms.
[Work thread Feb 14 03:20] Timing 36 iterations of 1280K FFT length. Best time: 5.789 ms., avg time: 5.800 ms.
[Work thread Feb 14 03:20] Timing 30 iterations of 1536K FFT length. Best time: 6.920 ms., avg time: 6.930 ms.
[Work thread Feb 14 03:20] Timing 26 iterations of 1792K FFT length. Best time: 8.945 ms., avg time: 8.963 ms.
[Work thread Feb 14 03:20] Timing 25 iterations of 2048K FFT length. Best time: 9.365 ms., avg time: 9.378 ms.
[Work thread Feb 14 03:20] Timing 25 iterations of 2560K FFT length. Best time: 11.980 ms., avg time: 11.988 ms.
[Work thread Feb 14 03:20] Timing 25 iterations of 3072K FFT length. Best time: 14.513 ms., avg time: 14.525 ms.
[Work thread Feb 14 03:20] Timing 25 iterations of 3584K FFT length. Best time: 17.111 ms., avg time: 17.156 ms.
[Work thread Feb 14 03:20] Timing 25 iterations of 4096K FFT length. Best time: 19.565 ms., avg time: 19.578 ms.
[Work thread Feb 14 03:20] Timing FFTs using 2 threads on 2 physical CPUs.
[Work thread Feb 14 03:20] Timing 46 iterations of 1024K FFT length. Best time: 2.751 ms., avg time: 2.763 ms.
[Work thread Feb 14 03:20] Timing 36 iterations of 1280K FFT length. Best time: 3.496 ms., avg time: 3.513 ms.
[Work thread Feb 14 03:20] Timing 30 iterations of 1536K FFT length. Best time: 4.050 ms., avg time: 4.095 ms.
[Work thread Feb 14 03:20] Timing 26 iterations of 1792K FFT length. Best time: 4.952 ms., avg time: 4.993 ms.
[Work thread Feb 14 03:20] Timing 25 iterations of 2048K FFT length. Best time: 5.687 ms., avg time: 5.714 ms.
[Work thread Feb 14 03:20] Timing 25 iterations of 2560K FFT length. Best time: 7.372 ms., avg time: 7.387 ms.
[Work thread Feb 14 03:20] Timing 25 iterations of 3072K FFT length. Best time: 8.716 ms., avg time: 8.803 ms.
[Work thread Feb 14 03:20] Timing 25 iterations of 3584K FFT length. Best time: 10.557 ms., avg time: 10.651 ms.
[Work thread Feb 14 03:20] Timing 25 iterations of 4096K FFT length. Best time: 11.389 ms., avg time: 11.480 ms.

[Work thread Feb 14 03:20] Benchmarking multiple workers to measure the impact of memory bandwidth
[Work thread Feb 14 03:20] Timing 1024K FFT, 2 cpus, 1 worker. Average times: 2.61 ms. Total throughput: 383.01 iter/sec.
[Work thread Feb 14 03:21] Timing 1024K FFT, 2 cpus, 2 workers. Average times: 5.25, 5.23 ms. Total throughput: 381.51 iter/sec.
[Work thread Feb 14 03:21] Timing 1280K FFT, 2 cpus, 1 worker. Average times: 3.34 ms. Total throughput: 299.68 iter/sec.
[Work thread Feb 14 03:21] Timing 1280K FFT, 2 cpus, 2 workers. Average times: 6.59, 6.56 ms. Total throughput: 304.07 iter/sec.
[Work thread Feb 14 03:22] Timing 1536K FFT, 2 cpus, 1 worker. Average times: 3.96 ms. Total throughput: 252.35 iter/sec.
[Work thread Feb 14 03:22] Timing 1536K FFT, 2 cpus, 2 workers. Average times: 7.90, 7.89 ms. Total throughput: 253.21 iter/sec.
[Work thread Feb 14 03:22] Timing 1792K FFT, 2 cpus, 1 worker. Average times: 4.84 ms. Total throughput: 206.76 iter/sec.
[Work thread Feb 14 03:23] Timing 1792K FFT, 2 cpus, 2 workers. Average times: 9.61, 9.61 ms. Total throughput: 208.08 iter/sec.
[Work thread Feb 14 03:23] Timing 2048K FFT, 2 cpus, 1 worker. Average times: 5.52 ms. Total throughput: 181.32 iter/sec.
[Work thread Feb 14 03:23] Timing 2048K FFT, 2 cpus, 2 workers. Average times: 10.87, 10.87 ms. Total throughput: 183.92 iter/sec.
[Work thread Feb 14 03:24] Timing 2560K FFT, 2 cpus, 1 worker. Average times: 6.99 ms. Total throughput: 143.16 iter/sec.
[Work thread Feb 14 03:24] Timing 2560K FFT, 2 cpus, 2 workers. Average times: 13.98, 13.89 ms. Total throughput: 143.52 iter/sec.
[Work thread Feb 14 03:25] Timing 3072K FFT, 2 cpus, 1 worker. Average times: 8.47 ms. Total throughput: 118.00 iter/sec.
[Work thread Feb 14 03:25] Timing 3072K FFT, 2 cpus, 2 workers. Average times: 16.75, 16.62 ms. Total throughput: 119.87 iter/sec.
[Work thread Feb 14 03:25] Timing 3584K FFT, 2 cpus, 1 worker. Average times: 10.03 ms. Total throughput: 99.69 iter/sec.
[Work thread Feb 14 03:26] Timing 3584K FFT, 2 cpus, 2 workers. Average times: 20.07, 19.75 ms. Total throughput: 100.46 iter/sec.
[Work thread Feb 14 03:26] Timing 4096K FFT, 2 cpus, 1 worker. Average times: 11.44 ms. Total throughput: 87.38 iter/sec.
[Work thread Feb 14 03:26] Timing 4096K FFT, 2 cpus, 2 workers. Average times: 22.67, 22.57 ms. Total throughput: 88.42 iter/sec.


TL;DR: get dual rank memory.
Mark Rose is offline   Reply With Quote
Old 2017-02-14, 17:15   #736
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2×3×1,693 Posts
Default

Quote:
TL;DR: get dual rank memory.
I wish it were easier to determine which parts are dual.
kladner is offline   Reply With Quote
Old 2017-03-08, 22:29   #737
VictordeHolland
 
VictordeHolland's Avatar
 
"Victor de Hollander"
Aug 2011
the Netherlands

23×3×72 Posts
Default

Just for fun I ran the Prime95 benchmark on my old 11'' netbook with AMD E-350 @1.6GHz

Code:
AMD E-350 Processor
CPU speed: 1596.06 MHz, 2 cores
CPU features: 3DNow! Prefetch, SSE, SSE2
L1 cache size: 32 KB
L2 cache size: 512 KB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
L1 TLBS: 40
L2 TLBS: 512
Prime95 32-bit version 28.10, RdtscTiming=1
Best time for 1024K FFT length: 99.916 ms., avg: 103.876 ms.
Best time for 1280K FFT length: 142.989 ms., avg: 149.942 ms.
Best time for 1536K FFT length: 167.401 ms., avg: 174.861 ms.
Best time for 1792K FFT length: 196.051 ms., avg: 213.505 ms.
Best time for 2048K FFT length: 222.136 ms., avg: 231.517 ms.
Best time for 2560K FFT length: 275.761 ms., avg: 289.138 ms.
Best time for 3072K FFT length: 347.029 ms., avg: 357.043 ms.
Best time for 3584K FFT length: 403.094 ms., avg: 417.261 ms.
Best time for 4096K FFT length: 458.478 ms., avg: 477.660 ms.
Best time for 5120K FFT length: 645.214 ms., avg: 656.045 ms.
Best time for 6144K FFT length: 778.602 ms., avg: 801.066 ms.
Best time for 7168K FFT length: 936.351 ms., avg: 955.565 ms.
Best time for 8192K FFT length: 1056.663 ms., avg: 1081.029 ms.
Timing FFTs using 2 threads.
Best time for 1024K FFT length: 50.489 ms., avg: 51.903 ms.
Best time for 1280K FFT length: 71.734 ms., avg: 74.501 ms.
Best time for 1536K FFT length: 83.726 ms., avg: 86.498 ms.
Best time for 1792K FFT length: 98.136 ms., avg: 101.268 ms.
Best time for 2048K FFT length: 111.295 ms., avg: 113.926 ms.
Best time for 2560K FFT length: 137.782 ms., avg: 144.518 ms.
Best time for 3072K FFT length: 172.050 ms., avg: 178.164 ms.
Best time for 3584K FFT length: 203.060 ms., avg: 208.392 ms.
Best time for 4096K FFT length: 230.258 ms., avg: 236.426 ms.
Best time for 5120K FFT length: 319.971 ms., avg: 327.913 ms.
Best time for 6144K FFT length: 397.567 ms., avg: 418.475 ms.
Best time for 7168K FFT length: 477.554 ms., avg: 487.654 ms.
Best time for 8192K FFT length: 524.522 ms., avg: 534.575 ms.

Timings for 1024K FFT length (2 cpus, 2 workers): 104.66, 98.42 ms.  Throughput: 19.72 iter/sec.
Timings for 1280K FFT length (2 cpus, 2 workers): 149.56, 140.02 ms.  Throughput: 13.83 iter/sec.
Timings for 1536K FFT length (2 cpus, 2 workers): 176.08, 163.24 ms.  Throughput: 11.81 iter/sec.
Timings for 1792K FFT length (2 cpus, 2 workers): 198.76, 186.35 ms.  Throughput: 10.40 iter/sec.
Timings for 2048K FFT length (2 cpus, 2 workers): 223.87, 209.46 ms.  Throughput:  9.24 iter/sec.
Timings for 2560K FFT length (2 cpus, 2 workers): 288.10, 267.54 ms.  Throughput:  7.21 iter/sec.
Timings for 3072K FFT length (2 cpus, 2 workers): 356.97, 335.54 ms.  Throughput:  5.78 iter/sec.
Timings for 3584K FFT length (2 cpus, 2 workers): 429.52, 401.99 ms.  Throughput:  4.82 iter/sec.
Timings for 4096K FFT length (2 cpus, 2 workers): 497.72, 464.37 ms.  Throughput:  4.16 iter/sec.
Timings for 5120K FFT length (2 cpus, 2 workers): 695.86, 655.48 ms.  Throughput:  2.96 iter/sec.
Timings for 6144K FFT length (2 cpus, 2 workers): 792.26, 738.64 ms.  Throughput:  2.62 iter/sec.
Timings for 7168K FFT length (2 cpus, 2 workers): 986.58, 920.18 ms.  Throughput:  2.10 iter/sec.
Timings for 8192K FFT length (2 cpus, 2 workers): 1096.14, 1023.91 ms.  Throughput:  1.89 iter/sec.
VictordeHolland is offline   Reply With Quote
Old 2017-03-11, 20:56   #738
thedigitalone
 
Mar 2017
PNW

18 Posts
Default AMD Ryzen 7 1800X Eight-Core Benchmark

AMD Ryzen 7 1800X Eight-Core Processor
CPU speed: 3447.35 MHz, 16 cores
CPU features: 3DNow! Prefetch, SSE, SSE2, SSE4, AVX, AVX2, FMA
L1 cache size: 32 KB
L2 cache size: 512 KB, L3 cache size: 16 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
L1 TLBS: 64
L2 TLBS: 1536
AMD Ryzen 7 1800X Eight-Core Processor
CPU speed: 3816.00 MHz, 16 cores
CPU features: 3DNow! Prefetch, SSE, SSE2, SSE4, AVX, AVX2, FMA
L1 cache size: 32 KB
L2 cache size: 512 KB, L3 cache size: 16 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
L1 TLBS: 64
L2 TLBS: 1536
Prime95 64-bit version 28.10, RdtscTiming=1
Prime95 64-bit version 28.10, RdtscTiming=1
AMD Ryzen 7 1800X Eight-Core Processor
CPU speed: 3545.99 MHz, 16 cores
CPU features: 3DNow! Prefetch, SSE, SSE2, SSE4, AVX, AVX2, FMA
L1 cache size: 32 KB
L2 cache size: 512 KB, L3 cache size: 16 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
L1 TLBS: 64
L2 TLBS: 1536
AMD Ryzen 7 1800X Eight-Core Processor
CPU speed: 3816.00 MHz, 16 cores
CPU features: 3DNow! Prefetch, SSE, SSE2, SSE4, AVX, AVX2, FMA
L1 cache size: 32 KB
L2 cache size: 512 KB, L3 cache size: 16 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
L1 TLBS: 64
L2 TLBS: 1536
Prime95 64-bit version 28.10, RdtscTiming=1
AMD Ryzen 7 1800X Eight-Core Processor
CPU speed: 3592.12 MHz, 16 cores
CPU features: 3DNow! Prefetch, SSE, SSE2, SSE4, AVX, AVX2, FMA
L1 cache size: 32 KB
L2 cache size: 512 KB, L3 cache size: 16 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
L1 TLBS: 64
L2 TLBS: 1536
Prime95 64-bit version 28.10, RdtscTiming=1
Prime95 64-bit version 28.10, RdtscTiming=1
Compare your results to other computers at http://www.mersenne.org/report_benchmarks
AMD Ryzen 7 1800X Eight-Core Processor
CPU speed: 3422.28 MHz, 16 cores
CPU features: 3DNow! Prefetch, SSE, SSE2, SSE4, AVX, AVX2, FMA
L1 cache size: 32 KB
L2 cache size: 512 KB, L3 cache size: 16 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
L1 TLBS: 64
L2 TLBS: 1536
Prime95 64-bit version 28.10, RdtscTiming=1
Best time for 1024K FFT length: 13.807 ms., avg: 14.273 ms.
Best time for 1280K FFT length: 18.159 ms., avg: 18.537 ms.
Best time for 1536K FFT length: 22.682 ms., avg: 23.107 ms.
Best time for 1792K FFT length: 26.061 ms., avg: 26.566 ms.
Best time for 2048K FFT length: 29.822 ms., avg: 30.188 ms.
Best time for 2560K FFT length: 37.698 ms., avg: 39.824 ms.
Best time for 3072K FFT length: 45.986 ms., avg: 46.468 ms.
Best time for 3584K FFT length: 53.277 ms., avg: 53.670 ms.
Best time for 4096K FFT length: 61.075 ms., avg: 61.465 ms.
Best time for 5120K FFT length: 79.219 ms., avg: 79.853 ms.
Best time for 6144K FFT length: 97.915 ms., avg: 98.519 ms.
Best time for 7168K FFT length: 114.295 ms., avg: 115.312 ms.
Best time for 8192K FFT length: 126.941 ms., avg: 127.963 ms.
Timing FFTs using 2 threads.
Best time for 1024K FFT length: 7.076 ms., avg: 7.169 ms.
Best time for 1280K FFT length: 9.170 ms., avg: 9.299 ms.
Best time for 1536K FFT length: 11.285 ms., avg: 11.411 ms.
Best time for 1792K FFT length: 13.462 ms., avg: 13.579 ms.
Best time for 2048K FFT length: 14.929 ms., avg: 15.151 ms.
Best time for 2560K FFT length: 18.810 ms., avg: 19.041 ms.
Best time for 3072K FFT length: 23.086 ms., avg: 23.249 ms.
Best time for 3584K FFT length: 27.528 ms., avg: 27.643 ms.
Best time for 4096K FFT length: 30.559 ms., avg: 30.740 ms.
Best time for 5120K FFT length: 39.988 ms., avg: 40.245 ms.
Best time for 6144K FFT length: 48.979 ms., avg: 49.186 ms.
Best time for 7168K FFT length: 58.807 ms., avg: 59.122 ms.
Best time for 8192K FFT length: 63.872 ms., avg: 64.181 ms.
Timing FFTs using 3 threads.
Best time for 1024K FFT length: 4.809 ms., avg: 4.861 ms.
Best time for 1280K FFT length: 6.261 ms., avg: 6.294 ms.
Best time for 1536K FFT length: 7.697 ms., avg: 7.875 ms.
Best time for 1792K FFT length: 9.046 ms., avg: 9.085 ms.
Best time for 2048K FFT length: 10.120 ms., avg: 10.228 ms.
Best time for 2560K FFT length: 12.718 ms., avg: 13.066 ms.
Best time for 3072K FFT length: 15.739 ms., avg: 15.773 ms.
Best time for 3584K FFT length: 18.524 ms., avg: 18.633 ms.
Best time for 4096K FFT length: 20.753 ms., avg: 20.859 ms.
Best time for 5120K FFT length: 27.089 ms., avg: 27.224 ms.
Best time for 6144K FFT length: 33.261 ms., avg: 33.396 ms.
Best time for 7168K FFT length: 39.614 ms., avg: 39.760 ms.
Best time for 8192K FFT length: 43.169 ms., avg: 43.302 ms.
Timing FFTs using 4 threads.
Best time for 1024K FFT length: 3.602 ms., avg: 3.636 ms.
Best time for 1280K FFT length: 4.677 ms., avg: 4.888 ms.
Best time for 1536K FFT length: 5.749 ms., avg: 5.790 ms.
Best time for 1792K FFT length: 6.857 ms., avg: 7.013 ms.
Best time for 2048K FFT length: 7.604 ms., avg: 7.692 ms.
Best time for 2560K FFT length: 9.624 ms., avg: 9.911 ms.
Best time for 3072K FFT length: 11.773 ms., avg: 11.844 ms.
Best time for 3584K FFT length: 14.034 ms., avg: 14.151 ms.
Best time for 4096K FFT length: 15.621 ms., avg: 15.658 ms.
Best time for 5120K FFT length: 20.389 ms., avg: 20.476 ms.
Best time for 6144K FFT length: 25.047 ms., avg: 25.197 ms.
Best time for 7168K FFT length: 30.019 ms., avg: 30.175 ms.
Best time for 8192K FFT length: 32.537 ms., avg: 32.675 ms.
Timing FFTs using 5 threads.
Best time for 1024K FFT length: 2.925 ms., avg: 2.953 ms.
Best time for 1280K FFT length: 3.802 ms., avg: 3.868 ms.
Best time for 1536K FFT length: 4.691 ms., avg: 4.757 ms.
Best time for 1792K FFT length: 5.526 ms., avg: 5.581 ms.
Best time for 2048K FFT length: 6.164 ms., avg: 6.211 ms.
Best time for 2560K FFT length: 7.778 ms., avg: 7.811 ms.
Best time for 3072K FFT length: 9.525 ms., avg: 10.635 ms.
Best time for 3584K FFT length: 11.330 ms., avg: 11.412 ms.
Best time for 4096K FFT length: 12.643 ms., avg: 12.736 ms.
Best time for 5120K FFT length: 16.489 ms., avg: 17.282 ms.
Best time for 6144K FFT length: 20.238 ms., avg: 20.950 ms.
Best time for 7168K FFT length: 24.345 ms., avg: 24.513 ms.
Best time for 8192K FFT length: 26.286 ms., avg: 26.744 ms.
Timing FFTs using 6 threads.
Best time for 1024K FFT length: 2.470 ms., avg: 2.858 ms.
Best time for 1280K FFT length: 3.207 ms., avg: 3.360 ms.
Best time for 1536K FFT length: 3.912 ms., avg: 3.999 ms.
Best time for 1792K FFT length: 4.741 ms., avg: 5.378 ms.
Best time for 2048K FFT length: 5.243 ms., avg: 5.828 ms.
Best time for 2560K FFT length: 6.570 ms., avg: 7.218 ms.
Best time for 3072K FFT length: 8.085 ms., avg: 8.629 ms.
Best time for 3584K FFT length: 9.679 ms., avg: 11.073 ms.
Best time for 4096K FFT length: 10.725 ms., avg: 11.134 ms.
Best time for 5120K FFT length: 13.989 ms., avg: 14.172 ms.
Best time for 6144K FFT length: 17.201 ms., avg: 17.386 ms.
Best time for 7168K FFT length: 21.077 ms., avg: 21.710 ms.
Best time for 8192K FFT length: 22.249 ms., avg: 22.662 ms.
Timing FFTs using 7 threads.
Best time for 1024K FFT length: 2.160 ms., avg: 2.268 ms.
Best time for 1280K FFT length: 3.178 ms., avg: 4.154 ms.
Best time for 1536K FFT length: 3.406 ms., avg: 3.569 ms.
Best time for 1792K FFT length: 4.156 ms., avg: 4.415 ms.
Best time for 2048K FFT length: 4.602 ms., avg: 4.904 ms.
Best time for 2560K FFT length: 5.741 ms., avg: 6.230 ms.
Best time for 3072K FFT length: 7.109 ms., avg: 7.462 ms.
thedigitalone is offline   Reply With Quote
Old 2017-03-23, 05:47   #739
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

293010 Posts
Default

Clear Linux gives a small boost in mprime throughput.

I've recently been playing around with Intel's Clear Linux distribution. It's compiled with optimizations and built specifically for Intel's latest processors. Given that mprime's LL is mostly hand-tuned assembly, I wasn't expecting to see a difference in performance compared to Ubuntu 16.04, but I have.

I'm running my cluster of i5-6600's at 3.3 GHz, as the dual rank, dual channel DDR3-2133 makes it not worth the watts to run the CPUs any faster.

That being said, Clear Linux at 3.3 GHz is up to 3% faster than Ubuntu at 3.6 GHz.

I've updated my benchmark spreadsheet.

My guess is the difference comes down to different kernels and fewer background tasks running.
Mark Rose is offline   Reply With Quote
Old 2017-04-05, 14:39   #740
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

293010 Posts
Default

I finally got around to experimenting with undervolting. So far I've lowered VCore by 0.10 volts and I've passed a 7 hour stress test. The result? Saved another 12.5 watts per node, so my 4 node cluster is now consuming only 270 watts at the wall, or 243 from the nodes (at 3.3 GHz all cores). With a 4096 FFT, 4 cores take 5.37 ms/iter, for 2.76 iter/sec/watt at the wall, or 3.06 iter/sec/watt from the nodes.

Compare that to the GTX 1080 Ti, which consumes 180 watts from the card to get 2.63 ms/iter, for 2.12 iter/sec/watt.

I wasn't expecting CPUs to be 44% more efficient.

I'm going to try lowering VCore more soon. I might have to add more nodes to this power supply.
Mark Rose is offline   Reply With Quote
Old 2017-04-10, 06:53   #741
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

61×79 Posts
Default Intel(R) Celeron(R) CPU N2840 @ 2.16GHz

Code:
Compare your results to other computers at http://www.mersenne.org/report_benchmarks
Intel(R) Celeron(R) CPU  N2840  @ 2.16GHz
CPU speed: 2557.70 MHz, 2 cores
CPU features: Prefetchw, SSE, SSE2, SSE4
L1 cache size: 24 KB
L2 cache size: 1 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 128
Machine topology as determined by hwloc library:
 Machine#0 (total=2492796KB, Backend=Windows, hwlocVersion=1.11.6, ProcessName=prime95.exe)
  NUMANode#0 (local=2492796KB, total=2492796KB)
    Package#0 (CPUVendor=GenuineIntel, CPUFamilyNumber=6, CPUModelNumber=55, CPUModel="Intel(R) Celeron(R) CPU  N2840  @ 2.16GHz", CPUStepping=8)
      L2 (size=1024KB, linesize=64, ways=16, Inclusive=0)
        L1d (size=24KB, linesize=64, ways=6, Inclusive=0)
          Core (cpuset: 0x00000001)
            PU#0 (cpuset: 0x00000001)
        L1d (size=24KB, linesize=64, ways=6, Inclusive=0)
          Core (cpuset: 0x00000002)
            PU#1 (cpuset: 0x00000002)
Prime95 64-bit version 29.1, RdtscTiming=1
Timing FFTs using 2 cores.
Best time for 1024K FFT length: 22.048 ms., avg: 23.196 ms.
Best time for 1280K FFT length: 29.125 ms., avg: 30.133 ms.
Best time for 1536K FFT length: 35.795 ms., avg: 36.288 ms.
Best time for 1792K FFT length: 45.152 ms., avg: 46.324 ms.
Best time for 2048K FFT length: 47.919 ms., avg: 49.040 ms.
Best time for 2560K FFT length: 60.895 ms., avg: 64.173 ms.
Best time for 3072K FFT length: 77.295 ms., avg: 80.964 ms.
Best time for 3584K FFT length: 97.452 ms., avg: 98.772 ms.
Best time for 4096K FFT length: 117.728 ms., avg: 118.825 ms.
Best time for 5120K FFT length: 144.734 ms., avg: 148.510 ms.
Best time for 6144K FFT length: 186.521 ms., avg: 188.309 ms.
Best time for 7168K FFT length: 282.553 ms., avg: 284.238 ms.
Best time for 8192K FFT length: 302.990 ms., avg: 306.707 ms.
ET_ is offline   Reply With Quote
Old 2017-04-10, 07:08   #742
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

23·149 Posts
Default

Quote:
Originally Posted by ET_ View Post
Intel(R) Celeron(R) CPU N2840 @ 2.16GHz
Timing FFTs using 2 cores.
Almost useful to me, except you only posted timing for 2 cores, not a the single-thread test I need for benchmarks.
James Heinrich is offline   Reply With Quote
Old 2017-04-10, 09:14   #743
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

61×79 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
Almost useful to me, except you only posted timing for 2 cores, not a the single-thread test I need for benchmarks.
I used the option FFT timings benchmark.

Here are the results for the option Throughput benchmark...

Code:
[Mon Apr 10 10:53:20 2017]
Compare your results to other computers at http://www.mersenne.org/report_benchmarks
Intel(R) Celeron(R) CPU  N2840  @ 2.16GHz
CPU speed: 2557.77 MHz, 2 cores
CPU features: Prefetchw, SSE, SSE2, SSE4
L1 cache size: 24 KB
L2 cache size: 1 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 128
Machine topology as determined by hwloc library:
 Machine#0 (total=2170248KB, Backend=Windows, hwlocVersion=1.11.6, ProcessName=prime95.exe)
  NUMANode#0 (local=2170248KB, total=2170248KB)
    Package#0 (CPUVendor=GenuineIntel, CPUFamilyNumber=6, CPUModelNumber=55, CPUModel="Intel(R) Celeron(R) CPU  N2840  @ 2.16GHz", CPUStepping=8)
      L2 (size=1024KB, linesize=64, ways=16, Inclusive=0)
        L1d (size=24KB, linesize=64, ways=6, Inclusive=0)
          Core (cpuset: 0x00000001)
            PU#0 (cpuset: 0x00000001)
        L1d (size=24KB, linesize=64, ways=6, Inclusive=0)
          Core (cpuset: 0x00000002)
            PU#1 (cpuset: 0x00000002)
Prime95 64-bit version 29.1, RdtscTiming=1
Timings for 1024K FFT length (2 cpus, 1 worker): 23.38 ms.  Throughput: 42.76 iter/sec.
Timings for 1024K FFT length (2 cpus, 2 workers): 46.76, 46.03 ms.  Throughput: 43.11 iter/sec.
Timings for 1280K FFT length (2 cpus, 1 worker): 30.89 ms.  Throughput: 32.37 iter/sec.
Timings for 1280K FFT length (2 cpus, 2 workers): 61.83, 60.54 ms.  Throughput: 32.69 iter/sec.
Timings for 1536K FFT length (2 cpus, 1 worker): 37.43 ms.  Throughput: 26.72 iter/sec.
Timings for 1536K FFT length (2 cpus, 2 workers): 76.86, 74.73 ms.  Throughput: 26.39 iter/sec.
Timings for 1792K FFT length (2 cpus, 1 worker): 48.25 ms.  Throughput: 20.73 iter/sec.
Timings for 1792K FFT length (2 cpus, 2 workers): 97.16, 91.82 ms.  Throughput: 21.18 iter/sec.
Timings for 2048K FFT length (2 cpus, 1 worker): 51.60 ms.  Throughput: 19.38 iter/sec.
Timings for 2048K FFT length (2 cpus, 2 workers): 103.15, 99.23 ms.  Throughput: 19.77 iter/sec.
Timings for 2560K FFT length (2 cpus, 1 worker): 64.17 ms.  Throughput: 15.58 iter/sec.
Timings for 2560K FFT length (2 cpus, 2 workers): 128.12, 124.46 ms.  Throughput: 15.84 iter/sec.
Timings for 3072K FFT length (2 cpus, 1 worker): 80.92 ms.  Throughput: 12.36 iter/sec.
Timings for 3072K FFT length (2 cpus, 2 workers): 217.55, 216.71 ms.  Throughput:  9.21 iter/sec.
Timings for 3584K FFT length (2 cpus, 1 worker): 114.14 ms.  Throughput:  8.76 iter/sec.
Timings for 3584K FFT length (2 cpus, 2 workers): 322.22, 260.20 ms.  Throughput:  6.95 iter/sec.
[Mon Apr 10 10:58:35 2017]
Timings for 4096K FFT length (2 cpus, 1 worker): 152.85 ms.  Throughput:  6.54 iter/sec.
Timings for 4096K FFT length (2 cpus, 2 workers): 343.11, 248.39 ms.  Throughput:  6.94 iter/sec.
Timings for 5120K FFT length (2 cpus, 1 worker): 209.21 ms.  Throughput:  4.78 iter/sec.
Timings for 5120K FFT length (2 cpus, 2 workers): 474.79, 399.13 ms.  Throughput:  4.61 iter/sec.
Timings for 6144K FFT length (2 cpus, 1 worker): 240.27 ms.  Throughput:  4.16 iter/sec.
Timings for 6144K FFT length (2 cpus, 2 workers): 694.67, 595.62 ms.  Throughput:  3.12 iter/sec.
Timings for 7168K FFT length (2 cpus, 1 worker): 805.39 ms.  Throughput:  1.24 iter/sec.
Timings for 7168K FFT length (2 cpus, 2 workers): 1108.76, 926.34 ms.  Throughput:  1.98 iter/sec.
Timings for 8192K FFT length (2 cpus, 1 worker): 1045.18 ms.  Throughput:  0.96 iter/sec.
Timings for 8192K FFT length (2 cpus, 2 workers): 661.82, 562.99 ms.  Throughput:  3.29 iter/sec.
and the option trial factoring benchmark (repeated because of a strange value appearing on the 76bit section: I suspect some wrong timing interaction between the factoring threads and the thread responsible of writing data to disk; the same applied to the previous throughput benchmark during the writing of the timestamp)

Code:
[Mon Apr 10 11:05:55 2017]
Compare your results to other computers at http://www.mersenne.org/report_benchmarks
Intel(R) Celeron(R) CPU  N2840  @ 2.16GHz
CPU speed: 2558.10 MHz, 2 cores
CPU features: Prefetchw, SSE, SSE2, SSE4
L1 cache size: 24 KB
L2 cache size: 1 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 128
Machine topology as determined by hwloc library:
 Machine#0 (total=2170248KB, Backend=Windows, hwlocVersion=1.11.6, ProcessName=prime95.exe)
  NUMANode#0 (local=2170248KB, total=2170248KB)
    Package#0 (CPUVendor=GenuineIntel, CPUFamilyNumber=6, CPUModelNumber=55, CPUModel="Intel(R) Celeron(R) CPU  N2840  @ 2.16GHz", CPUStepping=8)
      L2 (size=1024KB, linesize=64, ways=16, Inclusive=0)
        L1d (size=24KB, linesize=64, ways=6, Inclusive=0)
          Core (cpuset: 0x00000001)
            PU#0 (cpuset: 0x00000001)
        L1d (size=24KB, linesize=64, ways=6, Inclusive=0)
          Core (cpuset: 0x00000002)
            PU#1 (cpuset: 0x00000002)
Prime95 64-bit version 29.1, RdtscTiming=1
Best time for 61 bit trial factors: 7.731 ms.
Best time for 62 bit trial factors: 20.862 ms.
Best time for 63 bit trial factors: 15.234 ms.
Best time for 64 bit trial factors: 17.497 ms.
Best time for 65 bit trial factors: 19.764 ms.
Best time for 66 bit trial factors: 19.450 ms.
Best time for 67 bit trial factors: 54.953 ms.
Best time for 75 bit trial factors: 78.660 ms.
Best time for 76 bit trial factors: 1.207 ms.
Best time for 77 bit trial factors: 22.409 ms.
Compare your results to other computers at http://www.mersenne.org/report_benchmarks
Intel(R) Celeron(R) CPU  N2840  @ 2.16GHz
CPU speed: 2557.81 MHz, 2 cores
CPU features: Prefetchw, SSE, SSE2, SSE4
L1 cache size: 24 KB
L2 cache size: 1 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 128
Machine topology as determined by hwloc library:
 Machine#0 (total=2170248KB, Backend=Windows, hwlocVersion=1.11.6, ProcessName=prime95.exe)
  NUMANode#0 (local=2170248KB, total=2170248KB)
    Package#0 (CPUVendor=GenuineIntel, CPUFamilyNumber=6, CPUModelNumber=55, CPUModel="Intel(R) Celeron(R) CPU  N2840  @ 2.16GHz", CPUStepping=8)
      L2 (size=1024KB, linesize=64, ways=16, Inclusive=0)
        L1d (size=24KB, linesize=64, ways=6, Inclusive=0)
          Core (cpuset: 0x00000001)
            PU#0 (cpuset: 0x00000001)
        L1d (size=24KB, linesize=64, ways=6, Inclusive=0)
          Core (cpuset: 0x00000002)
            PU#1 (cpuset: 0x00000002)
Prime95 64-bit version 29.1, RdtscTiming=1
Best time for 61 bit trial factors: 7.615 ms.
Best time for 62 bit trial factors: 7.840 ms.
Best time for 63 bit trial factors: 11.009 ms.
Best time for 64 bit trial factors: 13.912 ms.
Best time for 65 bit trial factors: 17.365 ms.
Best time for 66 bit trial factors: 18.360 ms.
Best time for 67 bit trial factors: 18.486 ms.
Best time for 75 bit trial factors: 46.705 ms.
Best time for 76 bit trial factors: 18.084 ms.
Best time for 77 bit trial factors: 19.320 ms.
Let me know if you need any more data

Luigi

Last fiddled with by ET_ on 2017-04-10 at 09:18
ET_ is offline   Reply With Quote
Old 2017-04-11, 08:21   #744
db597
 
db597's Avatar
 
Jan 2003

CB16 Posts
Default Ryzen 1700 benchmark results

I posted the below results from my Ryzen 1700 (non-X) in the AMD Zen speculation thread earlier. Just thought I'd consolidate the results together with all the other benchmarks in this thread and also add a bit more detail on the setup.

CPU: AMD Ryzen 1700 (non-X)
Frequency: 3.32GHz @ 1.031V (stock rating 3GHz / Turbo 3.7GHz)
Heatsink: AMD Wraith Spire
Memory: Corsair 8GBx2 @ 2933GHz CAS16 (single rank)
Motherboard Asus X370-Pro
BIOS: 0604 (AGESA 1.0.0.4a)
Operating system: Windows 10 x64 Creators Update
Prime95 version: 29.1 Build 15

Code:
AMD Ryzen 7 1700 Eight-Core Processor
CPU speed: 3318.72 MHz, 8 hyperthreaded cores
CPU features: 3DNow! Prefetch, SSE, SSE2, SSE4, AVX, AVX2, FMA
L1 cache size: 32 KB
L2 cache size: 512 KB, L3 cache size: 16 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
L1 TLBS: 64
L2 TLBS: 1536
Prime95 64-bit version 29.1, RdtscTiming=1

I rearranged the benchmark results below for a bit easier reading / comparison:

Timings for 1024K FFT length (1 cpu, 1 worker): 7.83 ms. Throughput: 127.69 iter/sec.
Timings for 1280K FFT length (1 cpu, 1 worker): 9.88 ms. Throughput: 101.17 iter/sec.
Timings for 1536K FFT length (1 cpu, 1 worker): 11.97 ms. Throughput: 83.57 iter/sec.
Timings for 1792K FFT length (1 cpu, 1 worker): 14.58 ms. Throughput: 68.60 iter/sec.
Timings for 2048K FFT length (1 cpu, 1 worker): 16.05 ms. Throughput: 62.29 iter/sec.
Timings for 2560K FFT length (1 cpu, 1 worker): 20.60 ms. Throughput: 48.55 iter/sec.
Timings for 3072K FFT length (1 cpu, 1 worker): 24.87 ms. Throughput: 40.20 iter/sec.
Timings for 3584K FFT length (1 cpu, 1 worker): 29.90 ms. Throughput: 33.44 iter/sec.
Timings for 4096K FFT length (1 cpu, 1 worker): 34.18 ms. Throughput: 29.26 iter/sec.
Timings for 5120K FFT length (1 cpu, 1 worker): 42.60 ms. Throughput: 23.48 iter/sec.
Timings for 6144K FFT length (1 cpu, 1 worker): 50.67 ms. Throughput: 19.74 iter/sec.
Timings for 7168K FFT length (1 cpu, 1 worker): 60.12 ms. Throughput: 16.63 iter/sec.
Timings for 8192K FFT length (1 cpu, 1 worker): 68.76 ms. Throughput: 14.54 iter/sec.

Timings for 1024K FFT length (8 cpus, 1 worker): 1.13 ms. Throughput: 886.42 iter/sec.
Timings for 1280K FFT length (8 cpus, 1 worker): 1.42 ms. Throughput: 704.55 iter/sec.
Timings for 1536K FFT length (8 cpus, 1 worker): 1.71 ms. Throughput: 584.87 iter/sec.
Timings for 1792K FFT length (8 cpus, 1 worker): 2.10 ms. Throughput: 475.44 iter/sec.
Timings for 2048K FFT length (8 cpus, 1 worker): 2.39 ms. Throughput: 418.60 iter/sec.
Timings for 2560K FFT length (8 cpus, 1 worker): 3.96 ms. Throughput: 252.38 iter/sec.
Timings for 3072K FFT length (8 cpus, 1 worker): 4.97 ms. Throughput: 201.08 iter/sec.
Timings for 3584K FFT length (8 cpus, 1 worker): 5.97 ms. Throughput: 167.51 iter/sec.
Timings for 4096K FFT length (8 cpus, 1 worker): 6.92 ms. Throughput: 144.58 iter/sec.
Timings for 5120K FFT length (8 cpus, 1 worker): 7.32 ms. Throughput: 136.59 iter/sec.
Timings for 6144K FFT length (8 cpus, 1 worker): 9.37 ms. Throughput: 106.71 iter/sec.
Timings for 7168K FFT length (8 cpus, 1 worker): 10.96 ms. Throughput: 91.21 iter/sec.
Timings for 8192K FFT length (8 cpus, 1 worker): 12.69 ms. Throughput: 78.83 iter/sec.

Timings for 1024K FFT length (8 cpus, 8 workers): 11.30, 11.41, 11.28, 11.22, 11.18, 11.18, 11.21, 11.20 ms. Throughput: 711.26 iter/sec.
Timings for 1280K FFT length (8 cpus, 8 workers): 14.15, 14.51, 14.13, 14.15, 14.03, 14.05, 14.13, 14.16 ms. Throughput: 564.84 iter/sec.
Timings for 1536K FFT length (8 cpus, 8 workers): 16.81, 17.45, 16.96, 17.00, 16.84, 16.82, 16.91, 16.82 ms. Throughput: 472.01 iter/sec.
Timings for 1792K FFT length (8 cpus, 8 workers): 20.85, 21.81, 20.92, 21.12, 20.68, 20.92, 21.25, 20.77 ms. Throughput: 380.31 iter/sec.
Timings for 2048K FFT length (8 cpus, 8 workers): 22.60, 23.32, 22.76, 22.78, 22.54, 22.61, 22.61, 22.54 ms. Throughput: 352.17 iter/sec.
Timings for 2560K FFT length (8 cpus, 8 workers): 33.53, 34.97, 33.76, 34.34, 34.01, 33.93, 34.26, 33.98 ms. Throughput: 234.66 iter/sec.
Timings for 3072K FFT length (8 cpus, 8 workers): 41.23, 42.38, 41.51, 40.71, 40.84, 40.78, 40.87, 41.04 ms. Throughput: 194.34 iter/sec.
Timings for 3584K FFT length (8 cpus, 8 workers): 48.09, 49.43, 47.96, 48.77, 47.89, 47.32, 47.90, 47.23 ms. Throughput: 166.45 iter/sec.
Timings for 4096K FFT length (8 cpus, 8 workers): 56.27, 57.15, 55.09, 55.39, 55.64, 54.99, 54.88, 54.69 ms. Throughput: 144.14 iter/sec.
Timings for 5120K FFT length (8 cpus, 8 workers): 58.15, 60.30, 58.03, 57.82, 57.55, 57.00, 58.24, 57.01 ms. Throughput: 137.94 iter/sec.
Timings for 6144K FFT length (8 cpus, 8 workers): 70.59, 72.77, 71.30, 71.76, 70.77, 70.67, 70.83, 70.63 ms. Throughput: 112.43 iter/sec.
Timings for 7168K FFT length (8 cpus, 8 workers): 87.46, 87.18, 83.29, 83.81, 82.80, 83.61, 83.66, 83.11 ms. Throughput: 94.87 iter/sec.
Timings for 8192K FFT length (8 cpus, 8 workers): 99.83, 99.12, 96.13, 97.41, 96.20, 96.03, 96.76, 96.01 ms. Throughput: 82.33 iter/sec.
The 8192K FFT performance looks incredible on this version of Prime95, especially when all 8 cores are thrown at it. Would be good if someone can post results from a similarly priced Intel i7 7700K on Prime95 v29.1 Build 15 for comparison (I expect the i7 is a lot faster per core, but at the end of the day having double the cores may make it a rather close competition).

Last fiddled with by db597 on 2017-04-11 at 08:25
db597 is offline   Reply With Quote
Old 2017-04-12, 14:09   #745
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

26×151 Posts
Default

Well, not exactly the same price range, but for a comparison term: i7-6950X @ 3.00GHz (yes, underclocked, having momentarily problems with cooling, April is Thai summer, the hottest period of the year, ~45°C outside), with single worker, working on 8 cores (from 10), on the required FFT size, Prime95 64-bit version 28.10:

<snip>
Timing FFTs using 8 threads on 8 physical CPUs.
<snip>
Best time for 8192K FFT length: 7.136 ms., avg: 7.291 ms.
<snip>

Last fiddled with by LaurV on 2017-04-12 at 14:10
LaurV is offline   Reply With Quote
Old 2017-04-12, 17:28   #746
db597
 
db597's Avatar
 
Jan 2003

7×29 Posts
Default

@LaurV... thanks for the comparison benchmark.

So for the case of both systems running on 8 physical cores, it's 7.136ms for the i7-6950X @ 3.0GHz vs 12.69ms for the Ryzen 1700 @ 3.3GHz. Looks like Intel wins big in terms of IPC.

Would still be interesting to see the results from a i7-7700K (half the cores, but higher IPC and higher clockspeed)... to compare at a similar cost level (a Ryzen 1700 system being still a bit cheaper than a comparable i7-7700K system).
db597 is offline   Reply With Quote
Old 2017-04-13, 03:19   #747
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

100111101011102 Posts
Default

I don't understand this read on CPU speed. It was, and is running at 4.20GHz.
RAM is at 3200MHz.
Code:
[Wed Apr 12 22:03:33 2017]
Compare your results to other computers at http://www.mersenne.org/report_benchmarks
Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
CPU speed: 4008.14 MHz, 4 hyperthreaded cores
CPU features: Prefetchw, SSE, SSE2, SSE4, AVX, AVX2, FMA
L1 cache size: 32 KB
L2 cache size: 256 KB, L3 cache size: 8 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Machine topology as determined by hwloc library:
 Machine#0 (total=12649168KB, Backend=Windows, hwlocVersion=1.11.6, ProcessName=prime95.exe)
  NUMANode#0 (local=12649168KB, total=12649168KB)
    Package#0 (CPUVendor=GenuineIntel, CPUFamilyNumber=6, CPUModelNumber=94, CPUModel="Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz", CPUStepping=3)
      L3 (size=8192KB, linesize=64, ways=16, Inclusive=1)
        L2 (size=256KB, linesize=64, ways=4, Inclusive=0)
          L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
            Core (cpuset: 0x00000003)
              PU#0 (cpuset: 0x00000001)
              PU#1 (cpuset: 0x00000002)
        L2 (size=256KB, linesize=64, ways=4, Inclusive=0)
          L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
            Core (cpuset: 0x0000000c)
              PU#2 (cpuset: 0x00000004)
              PU#3 (cpuset: 0x00000008)
        L2 (size=256KB, linesize=64, ways=4, Inclusive=0)
          L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
            Core (cpuset: 0x00000030)
              PU#4 (cpuset: 0x00000010)
              PU#5 (cpuset: 0x00000020)
        L2 (size=256KB, linesize=64, ways=4, Inclusive=0)
          L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
            Core (cpuset: 0x000000c0)
              PU#6 (cpuset: 0x00000040)
              PU#7 (cpuset: 0x00000080)
Prime95 64-bit version 29.1, RdtscTiming=1
Timings for 1024K FFT length (1 cpu, 1 worker):  3.18 ms.  Throughput: 314.28 iter/sec.
Timings for 1024K FFT length (2 cpus, 1 worker):  1.67 ms.  Throughput: 599.56 iter/sec.
Timings for 1024K FFT length (3 cpus, 1 worker):  1.13 ms.  Throughput: 888.71 iter/sec.
Timings for 1024K FFT length (4 cpus, 1 worker):  0.86 ms.  Throughput: 1161.54 iter/sec.
Timings for 1280K FFT length (1 cpu, 1 worker):  4.04 ms.  Throughput: 247.48 iter/sec.
Timings for 1280K FFT length (2 cpus, 1 worker):  2.09 ms.  Throughput: 478.34 iter/sec.
Timings for 1280K FFT length (3 cpus, 1 worker):  1.44 ms.  Throughput: 695.49 iter/sec.
Timings for 1280K FFT length (4 cpus, 1 worker):  1.11 ms.  Throughput: 900.27 iter/sec.
Timings for 1536K FFT length (1 cpu, 1 worker):  4.89 ms.  Throughput: 204.35 iter/sec.
Timings for 1536K FFT length (2 cpus, 1 worker):  2.54 ms.  Throughput: 394.47 iter/sec.
Timings for 1536K FFT length (3 cpus, 1 worker):  1.73 ms.  Throughput: 579.18 iter/sec.
Timings for 1536K FFT length (4 cpus, 1 worker):  1.38 ms.  Throughput: 724.80 iter/sec.
Timings for 1792K FFT length (1 cpu, 1 worker):  6.14 ms.  Throughput: 162.89 iter/sec.
Timings for 1792K FFT length (2 cpus, 1 worker):  3.24 ms.  Throughput: 308.59 iter/sec.
Timings for 1792K FFT length (3 cpus, 1 worker):  2.17 ms.  Throughput: 461.04 iter/sec.
Timings for 1792K FFT length (4 cpus, 1 worker):  1.70 ms.  Throughput: 588.90 iter/sec.
Timings for 2048K FFT length (1 cpu, 1 worker):  6.52 ms.  Throughput: 153.46 iter/sec.
Timings for 2048K FFT length (2 cpus, 1 worker):  3.41 ms.  Throughput: 292.96 iter/sec.
Timings for 2048K FFT length (3 cpus, 1 worker):  2.36 ms.  Throughput: 423.56 iter/sec.
Timings for 2048K FFT length (4 cpus, 1 worker):  1.94 ms.  Throughput: 515.17 iter/sec.
Timings for 2560K FFT length (1 cpu, 1 worker):  8.59 ms.  Throughput: 116.35 iter/sec.
Timings for 2560K FFT length (2 cpus, 1 worker):  4.50 ms.  Throughput: 222.19 iter/sec.
Timings for 2560K FFT length (3 cpus, 1 worker):  3.05 ms.  Throughput: 327.92 iter/sec.
Timings for 2560K FFT length (4 cpus, 1 worker):  2.45 ms.  Throughput: 408.69 iter/sec.
Timings for 3072K FFT length (1 cpu, 1 worker): 10.24 ms.  Throughput: 97.65 iter/sec.
Timings for 3072K FFT length (2 cpus, 1 worker):  5.27 ms.  Throughput: 189.81 iter/sec.
Timings for 3072K FFT length (3 cpus, 1 worker):  3.62 ms.  Throughput: 276.07 iter/sec.
Timings for 3072K FFT length (4 cpus, 1 worker):  2.95 ms.  Throughput: 339.20 iter/sec.
[Wed Apr 12 22:08:44 2017]
Timings for 3584K FFT length (1 cpu, 1 worker): 12.36 ms.  Throughput: 80.90 iter/sec.
Timings for 3584K FFT length (2 cpus, 1 worker):  6.34 ms.  Throughput: 157.62 iter/sec.
Timings for 3584K FFT length (3 cpus, 1 worker):  4.33 ms.  Throughput: 230.80 iter/sec.
Timings for 3584K FFT length (4 cpus, 1 worker):  3.53 ms.  Throughput: 283.48 iter/sec.
Timings for 4096K FFT length (1 cpu, 1 worker): 14.18 ms.  Throughput: 70.50 iter/sec.
Timings for 4096K FFT length (2 cpus, 1 worker):  7.33 ms.  Throughput: 136.44 iter/sec.
Timings for 4096K FFT length (3 cpus, 1 worker):  5.01 ms.  Throughput: 199.63 iter/sec.
Timings for 4096K FFT length (4 cpus, 1 worker):  4.07 ms.  Throughput: 245.91 iter/sec.
kladner is offline   Reply With Quote
Old 2017-04-13, 03:36   #748
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

D6316 Posts
Default

Quote:
Originally Posted by kladner View Post
I don't understand this read on CPU speed. It was, and is running at 4.20GHz
According to Intel, Processor Base Frequency is 4.0GHz, Max Turbo Frequency is 4.2GHz.
I would guess that Prime95 reads the processor frequency on startup before starting the actual benchmark, and turbo doesn't kick in until the CPU is under load. You could use your favourite monitoring utility (e.g. CPU-Z) to monitor CPU frequency in realtime and see how it changes as you start/run the benchmark.
James Heinrich is offline   Reply With Quote
Old 2017-04-13, 04:07   #749
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

27AE16 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
According to Intel, Processor Base Frequency is 4.0GHz, Max Turbo Frequency is 4.2GHz.
I would guess that Prime95 reads the processor frequency on startup before starting the actual benchmark, and turbo doesn't kick in until the CPU is under load. You could use your favourite monitoring utility (e.g. CPU-Z) to monitor CPU frequency in realtime and see how it changes as you start/run the benchmark.
The first is of course true. I have mine set to sync all cores at multiplier 42x. It still will clock down with no extra-system loading, but the clock rate is very twitchy, and goes full on with minimal load increase.

I will have to see when it goes to max when starting the benchmark. This will include trying to limit other loads (like shutting down multi-tabbed Firefox,) and maybe even mfaktc to limit whatever memory contention might arise. Of course, too, limiting other loads is not a Real Life® situation, either. Most of the time I expect this machine to do a bunch of other stuff when I want it to, and this certainly takes a toll on the ms/it.

Last fiddled with by kladner on 2017-04-13 at 04:10
kladner is offline   Reply With Quote
Old 2017-04-13, 05:15   #750
db597
 
db597's Avatar
 
Jan 2003

20310 Posts
Default

So from the benchmarks it looks like 8 Ryzen cores is still slower than 4 Skylake/Kabylake cores:

Ryzen @ 3.3GHz:
Code:
Timings for 4096K FFT length (8 cpus, 1 worker): 6.92 ms. Throughput: 144.58 iter/sec.
i7-6700K @ 4.2GHz:
Code:
Timings for 4096K FFT length (4 cpus, 1 worker):  4.07 ms.  Throughput: 245.91 iter/sec.
Given it's a new architecture, I wonder if there is AMD specific optimisation that can be done. Going above 4 cores on Ryzen yields very small gains. It could be the slow communication between the CCX. If we can keep each worker within it's own CCX, perhaps running 4 cores x 2 workers might be the most optimum for Ryzen.
db597 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Perpetual "interesting video" thread... Xyzzy Lounge 43 2021-07-17 00:00
LLR benchmark thread Oddball Riesel Prime Search 5 2010-08-02 00:11
Perpetual I'm pi**ed off thread rogue Soap Box 19 2009-10-28 19:17
Perpetual autostereogram thread... Xyzzy Lounge 10 2006-09-28 00:36
Perpetual ECM factoring challenge thread... Xyzzy Factoring 65 2005-09-05 08:16

All times are UTC. The time now is 21:51.


Fri Aug 6 21:51:20 UTC 2021 up 14 days, 16:20, 1 user, load averages: 2.68, 2.54, 2.51

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.