mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
Thread Tools
Old 2016-02-03, 16:20   #651
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

3,313 Posts
Default

Quote:
Originally Posted by axn View Post
4096K FFT consumes 32MB of memory (plus change). This can run (almost) entirely out of the huge 35MB L3 cache of the Xeon. So despite the loss of efficiency due to multithreading, the 1 worker setup wins out.
I wondered how much the L2/L3 cache plays a part in it, but since I have no idea what size each "chunk" of work in a multi-threaded worker is like, I couldn't even hazard a guess.

That's another thing where George might get some optimizations, by targeting the chunk of data being worked on to the L3 cache size of that core, be it 1.5 MB, 2 MB, 2.5 MB etc. As in, would it be faster to do several smaller chunks of work that could fit in cache, or one larger chunk that would have to, by necessity, go out to main RAM?
Madpoo is offline   Reply With Quote
Old 2016-02-05, 05:12   #652
xtreme2k
 
xtreme2k's Avatar
 
Aug 2002

101011102 Posts
Default

Quote:
Originally Posted by axn View Post
4096K FFT consumes 32MB of memory (plus change). This can run (almost) entirely out of the huge 35MB L3 cache of the Xeon. So despite the loss of efficiency due to multithreading, the 1 worker setup wins out.
That is very interesting. I guess when we move to 5120KB FFT then I would loose the advantage. In fact we aren't that far away.

Should've got the 2696v3/2699v3 with 45MB L3 then
xtreme2k is offline   Reply With Quote
Old 2016-02-05, 07:31   #653
axn
 
axn's Avatar
 
Jun 2003

5,087 Posts
Default

Quote:
Originally Posted by xtreme2k View Post
That is very interesting. I guess when we move to 5120KB FFT then I would loose the advantage. In fact we aren't that far away.
Possibly. It would be interesting to find out at what FFT size the advantage shifts back.
axn is offline   Reply With Quote
Old 2016-02-05, 22:56   #654
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

3,313 Posts
Default

Quote:
Originally Posted by axn View Post
Possibly. It would be interesting to find out at what FFT size the advantage shifts back.
There is strong evidence to suggest there's a "sweet spot" for LL tests on different CPUs. There are so many variables involved that I haven't quite nailed it down... sure would be nice to have the benchmark test look at specific things and then have an option to automatically set Prime95 to whatever gives the peak output.

Things like L2/L3 cache sizes, memory speed, FFT sizes, # of cores (threads) per worker, total # of workers, whether you have single or dual+ socket motherboard, etc.

They all play a part and if you have the time and persistence you can figure out what works best for you, but I think the project would benefit from an automated "plug and play" configuration.

I've said it before and I'll say it again, I've seen obvious cases of horribly misconfigured systems that are doing multiple 4M+ FFT tests on a single CPU, and I just know it would be orders of magnitude more efficient if it did one worker using all cores.

Those are the systems where I can see it has 4 LL tests assigned to it and it's reporting results daily showing minimal progress. Either they only run it an hour a day, or the memory contention is as bad as I think it is, making all of them slow as molasses.
Madpoo is offline   Reply With Quote
Old 2016-02-05, 23:11   #655
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

3,313 Posts
Default

Quote:
Originally Posted by xtreme2k View Post
That is very interesting. I guess when we move to 5120KB FFT then I would loose the advantage. In fact we aren't that far away.

Should've got the 2696v3/2699v3 with 45MB L3 then
LOL... more is better. :)

In the case of the 16/18 core Haswell chips, they are clocked slightly slower than 14 core, so I guess it would be slightly faster.

AirSquirrels proved this by doing a check on the same exponent as me... him with dual 16-core Xeons, me with dual 14-core.

In my case I could only get 20-22 cores (14 on one chip, the rest on the other) before I started to see that it wasn't improving the performance, and actually started hindering it.

In his case, he had all 32 cores working together and says he didn't see any drop in speed, but I don't know if he tested other total worker counts like 24, 26, 28, 30, etc.

In the end, what took me 34 hours took him around 32 or something (it was the verification for M49).

Looks like Broadwell E5 will have the same 2.5 MB L3 cache per core... I haven't see any hard info on the L3 size (per core) on Skylake, but probably still 2.5 MB unless they surprise us all with 3-4 MB.

I'm still waiting for Knights Landing and it's up-to-16GB of fast memory, which I guess you could call L4 if the system is configured to use that as a fast cache to main RAM.

Again, no hard numbers on how the KNL memory bandwidth would compete with the L3 speed we enjoy now, but it would be faster than 6-channel DDR4, and that's a very good thing.
Madpoo is offline   Reply With Quote
Old 2016-03-01, 23:08   #656
xtreme2k
 
xtreme2k's Avatar
 
Aug 2002

2×3×29 Posts
Default

I ran the benchmark again focusing on 4096K+ FFT. It takes an incredibly long time to run and most of the time running different combos of workers and threads. It would be good if George would consider the following.
  • giving us an indication on FFT size and (optimal/preferred/ok) cache size usage per worker/cores
  • allowing us to select 1w/n-number of cores and then n-w/n-cores and skipping the inbetween w/c. On a many core system it just takes forever to run :)

For my initial results, even at 8192K FFT the 2697v3 is definitely preferring 1w/14c rather than 14w/14c. However the advantage between 1w/14c vs 14w/14c is smaller compared to 4096K

Last fiddled with by xtreme2k on 2016-03-01 at 23:13
xtreme2k is offline   Reply With Quote
Old 2016-06-12, 14:01   #657
WraithX
 
WraithX's Avatar
 
Mar 2006

13×37 Posts
Default

Benchmarks for my E5-2687W v4 (12 cores, 3.0GHz) with 128GB = 8x 16GB DDR4-2133 ECC Reg CL15 on Windows 7 Pro SP1 x64.

I'm thinking of rerunning the test with 64GB = 4x 16GB DDR4-2133, to see if a single DIMM per channel makes a difference. In these initial tests, I did see a couple of oddly high timings pop up. I think this may have happened because it hit the second DIMM on the memory channel (maybe?).

The 1st benchmark was run after downloading Prime95 28.9 and selecting Benchmark from the menu.

The 2nd benchmark used the following in prime.txt:
FullBench=1

The 3rd benchmark used the following in prime.txt:
MinBenchFFT=4096
MaxBenchFFT=4096
BenchHyperthreads=0
BenchMultithreads=1

The 4th benchmark used the following in prime.txt:
MinBenchFFT=8192
MaxBenchFFT=8192
BenchHyperthreads=0
BenchMultithreads=1

The 5th benchmark used the following in prime.txt:
FullBench=1
BenchHyperthreads=0
BenchMultithreads=1

I know this is a very small part of the program, but I think it would be very helpful if the Benchmark menu option would bring up an options dialog. With this you can quickly choose from the above options and perhaps save the results to different files.

Maybe the dialog can contain:
A radio button to select between "Standard Benchmark", "Full Benchmark", and maybe "Custom Benchmark"
A check box to use MinBenchFFT, which gets its input from a drop down menu. (I know this would be large with 127 options, but better than error checking user inputs)
A check box to use MaxBenchFFT, which gets its input from a drop down menu.
A check box to bench hyperthreads, if that is a) available for the processor and b) turned on in BIOS
A check box to bench multithreads, if the processor has multiple cores.
Perhaps a way to specify the affinity map to see how that affects benchmarks.
Perhaps a way to specify a file name to save these benchmark results to a separate file.
And controls for any other benchmark options I don't know about.
Attached Files
File Type: zip benchmarks.zip (82.0 KB, 103 views)

Last fiddled with by WraithX on 2016-06-12 at 14:04
WraithX is offline   Reply With Quote
Old 2016-06-13, 16:30   #658
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

63618 Posts
Default

Quote:
Originally Posted by WraithX View Post
Benchmarks for my E5-2687W v4 (12 cores, 3.0GHz) with 128GB = 8x 16GB DDR4-2133 ECC Reg CL15 on Windows 7 Pro SP1 x64.

I'm thinking of rerunning the test with 64GB = 4x 16GB DDR4-2133, to see if a single DIMM per channel makes a difference. In these initial tests, I did see a couple of oddly high timings pop up. I think this may have happened because it hit the second DIMM on the memory channel (maybe?).
I expect to get a delivery of a dual Xeon E5-2690 v4 (14 cores @ 2.6 GHz) in the next couple weeks. I'll be interested to put it through the benchmark and see how it compares to a E5-2697 v3 which is also 14 cores @ 2.6 GHz.

The differences will be the architecture itself (Broadwell) plus the v4 will have DDR4-2400 instead of DDR4-2133 for the v3. The mem speed alone would make a big difference.

Any reason you're using 2133 instead of 2400 memory? The E5-2687W v4 supports up to 2400, and since memory is the bottleneck with Prime95, you might take a look at that.

Depending on your motherboard, it may or may not reduce the clock on the memory if you're running 2 DPC. HP servers are "fun" in that if you're using official HP memory, you can run full speed with 2 DPC, but if you cheap out and use 3rd party memory, the BIOS will step down the memory speed from, for example, 1866 to 1600. That might not hold true for the gen9 boxes, but it's definitely the case on their gen8 Proliant servers with Xeon E5 v1/v2 processors.

It can also vary depending on the ranks of each module. I note with curiosity that on the new server I'm getting, it'll run 2DPC @ 2400 MHz if the modules are dual ranked 16GB, but if you use the single-ranked 16GB modules, it runs @ 2133 with 2DPC.

Doesn't matter to me, I'm only doing one DPC, so 8 across both CPUs, but it's curious. It even says I could use load reduced DIMMs (LRDIMMs) and get 3DPC at the full 2400.

I'm imagining a system now with a full 24 x 128GB LRDIMM @ 2400 for a total of 3 TB of RAM. Pair that with a couple of 22-core E5-2699v4 and you've got yourself one heckuva virtual host machine... Or better yet, 4 of those CPUs once the E5-46xx v4 chips come out.
Madpoo is offline   Reply With Quote
Old 2016-06-13, 17:55   #659
GP2
 
GP2's Avatar
 
Sep 2003

5×11×47 Posts
Default

Quote:
Originally Posted by Madpoo View Post
I expect to get a delivery of a dual Xeon E5-2690 v4 (14 cores @ 2.6 GHz) in the next couple weeks.
How many watts does one of these use when it's doing LL tests at full capacity? Not including the additional air-conditioning burden, which may be hard to calculate directly.
GP2 is offline   Reply With Quote
Old 2016-06-13, 18:09   #660
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

2×5×293 Posts
Default

The processor has a design power of 135 watts. Throw in some memory and chipset watts and powersupply inefficiency and you're probably close to 175 watts per processor.
Mark Rose is offline   Reply With Quote
Old 2016-06-13, 18:37   #661
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

11×47 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
The processor has a design power of 135 watts. Throw in some memory and chipset watts and powersupply inefficiency and you're probably close to 175 watts per processor.
My 2011v3 16 cores have a 135W TDP, but the entire system power consumption goes up only110W from idle when I ramp up P95.
airsquirrels is offline   Reply With Quote
Old 2016-06-14, 00:26   #662
WraithX
 
WraithX's Avatar
 
Mar 2006

13×37 Posts
Default

Quote:
Originally Posted by Madpoo View Post
I expect to get a delivery of a dual Xeon E5-2690 v4 (14 cores @ 2.6 GHz) in the next couple weeks. I'll be interested to put it through the benchmark and see how it compares to a E5-2697 v3 which is also 14 cores @ 2.6 GHz.

The differences will be the architecture itself (Broadwell) plus the v4 will have DDR4-2400 instead of DDR4-2133 for the v3. The mem speed alone would make a big difference.
That sounds like it'll be a nice machine. Someday I'll save up enough to make this a dual 2687Wv4.

Quote:
Originally Posted by Madpoo View Post
Any reason you're using 2133 instead of 2400 memory? The E5-2687W v4 supports up to 2400, and since memory is the bottleneck with Prime95, you might take a look at that.

Depending on your motherboard, it may or may not reduce the clock on the memory if you're running 2 DPC.

I'm imagining a system now with a full 24 x 128GB LRDIMM @ 2400 for a total of 3 TB of RAM. Pair that with a couple of 22-core E5-2699v4 and you've got yourself one heckuva virtual host machine... Or better yet, 4 of those CPUs once the E5-46xx v4 chips come out.
Yeah. This is my home machine, and the only workstation style motherboard I could find (that I wanted to buy) listed that it only supported up to 2133 memory. It said it would drop to 1866 if I did 2DPC, but I guess the BIOS update that added support for the v4 Xeon's also allowed it to stay at 2133 with 2DPC. This was funny in and of itself because I had to update the BIOS before I could use my v4's. I called around town, but nobody had v3 Xeon's (to purchase or to temporarily borrow for a BIOS update). So, I had to buy the cheapest v3 I could find just to update my BIOS so I could then use my v4's! There's a small chance that with the BIOS update the board now supports 2400, but I don't have the ram to test with.

Whoa, 3TB of ram, that would be amazing! Btw, I wouldn't make it VM host, I'd just run one OS and fill it up with ECM jobs!
WraithX is offline   Reply With Quote
Old 2016-06-14, 11:00   #663
xtreme2k
 
xtreme2k's Avatar
 
Aug 2002

2568 Posts
Default

Quote:
Originally Posted by Madpoo View Post
I expect to get a delivery of a dual Xeon E5-2690 v4 (14 cores @ 2.6 GHz) in the next couple weeks. I'll be interested to put it through the benchmark and see how it compares to a E5-2697 v3 which is also 14 cores @ 2.6 GHz.
:
Love to see the benchmark and comparisons between the V4 and the v3 and 2400 vs 2133

All I need is the 4096K 1/14t times
xtreme2k is offline   Reply With Quote
Old 2016-06-16, 14:25   #664
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

2·5·293 Posts
Default

So I ran some quick benchmarks:

c3.large:

[Work thread Jun 16 14:02] Timing 34 iterations of 1024K FFT length. Best time: 6.853 ms., avg time: 6.865 ms.
[Work thread Jun 16 14:02] Timing 27 iterations of 1280K FFT length. Best time: 8.677 ms., avg time: 8.706 ms.
[Work thread Jun 16 14:02] Timing 25 iterations of 1536K FFT length. Best time: 10.443 ms., avg time: 10.483 ms.
[Work thread Jun 16 14:02] Timing 25 iterations of 1792K FFT length. Best time: 12.547 ms., avg time: 12.623 ms.
[Work thread Jun 16 14:02] Timing 25 iterations of 2048K FFT length. Best time: 14.705 ms., avg time: 14.766 ms.
[Work thread Jun 16 14:02] Timing 25 iterations of 2560K FFT length. Best time: 17.644 ms., avg time: 17.709 ms.
[Work thread Jun 16 14:02] Timing 25 iterations of 3072K FFT length. Best time: 21.346 ms., avg time: 21.412 ms.
[Work thread Jun 16 14:02] Timing 25 iterations of 3584K FFT length. Best time: 25.685 ms., avg time: 25.720 ms.
[Work thread Jun 16 14:02] Timing 25 iterations of 4096K FFT length. Best time: 32.316 ms., avg time: 32.410 ms.
[Work thread Jun 16 14:02] Timing 25 iterations of 5120K FFT length. Best time: 40.270 ms., avg time: 40.385 ms.
[Work thread Jun 16 14:02] Timing 25 iterations of 6144K FFT length. Best time: 48.339 ms., avg time: 48.436 ms.
[Work thread Jun 16 14:02] Timing 25 iterations of 7168K FFT length. Best time: 57.406 ms., avg time: 57.462 ms.
[Work thread Jun 16 14:02] Timing 25 iterations of 8192K FFT length. Best time: 67.754 ms., avg time: 67.854 ms.
[Work thread Jun 16 14:02] Timing FFTs using 2 threads on 1 physical CPU.
[Work thread Jun 16 14:02] Timing 34 iterations of 1024K FFT length. Best time: 7.234 ms., avg time: 7.271 ms.
[Work thread Jun 16 14:02] Timing 27 iterations of 1280K FFT length. Best time: 9.249 ms., avg time: 9.329 ms.
[Work thread Jun 16 14:02] Timing 25 iterations of 1536K FFT length. Best time: 11.145 ms., avg time: 11.171 ms.
[Work thread Jun 16 14:02] Timing 25 iterations of 1792K FFT length. Best time: 13.310 ms., avg time: 13.389 ms.
[Work thread Jun 16 14:02] Timing 25 iterations of 2048K FFT length. Best time: 15.715 ms., avg time: 15.935 ms.
[Work thread Jun 16 14:02] Timing 25 iterations of 2560K FFT length. Best time: 19.288 ms., avg time: 19.469 ms.
[Work thread Jun 16 14:03] Timing 25 iterations of 3072K FFT length. Best time: 24.058 ms., avg time: 24.236 ms.
[Work thread Jun 16 14:03] Timing 25 iterations of 3584K FFT length. Best time: 30.966 ms., avg time: 33.027 ms.
[Work thread Jun 16 14:03] Timing 25 iterations of 4096K FFT length. Best time: 32.997 ms., avg time: 33.571 ms.
[Work thread Jun 16 14:03] Timing 25 iterations of 5120K FFT length. Best time: 41.625 ms., avg time: 42.549 ms.
[Work thread Jun 16 14:03] Timing 25 iterations of 6144K FFT length. Best time: 50.459 ms., avg time: 50.620 ms.
[Work thread Jun 16 14:03] Timing 25 iterations of 7168K FFT length. Best time: 59.329 ms., avg time: 59.725 ms.
[Work thread Jun 16 14:03] Timing 25 iterations of 8192K FFT length. Best time: 71.724 ms., avg time: 72.220 ms.

c4.large:

[Work thread Jun 16 14:02] Timing 36 iterations of 1024K FFT length. Best time: 4.640 ms., avg time: 4.649 ms.
[Work thread Jun 16 14:02] Timing 29 iterations of 1280K FFT length. Best time: 5.899 ms., avg time: 5.908 ms.
[Work thread Jun 16 14:02] Timing 25 iterations of 1536K FFT length. Best time: 7.101 ms., avg time: 7.116 ms.
[Work thread Jun 16 14:02] Timing 25 iterations of 1792K FFT length. Best time: 8.769 ms., avg time: 8.781 ms.
[Work thread Jun 16 14:02] Timing 25 iterations of 2048K FFT length. Best time: 9.593 ms., avg time: 9.661 ms.
[Work thread Jun 16 14:02] Timing 25 iterations of 2560K FFT length. Best time: 12.410 ms., avg time: 12.563 ms.
[Work thread Jun 16 14:02] Timing 25 iterations of 3072K FFT length. Best time: 15.232 ms., avg time: 15.350 ms.
[Work thread Jun 16 14:02] Timing 25 iterations of 3584K FFT length. Best time: 18.626 ms., avg time: 18.801 ms.
[Work thread Jun 16 14:02] Timing 25 iterations of 4096K FFT length. Best time: 21.407 ms., avg time: 21.609 ms.
[Work thread Jun 16 14:02] Timing 25 iterations of 5120K FFT length. Best time: 27.879 ms., avg time: 27.976 ms.
[Work thread Jun 16 14:02] Timing 25 iterations of 6144K FFT length. Best time: 33.080 ms., avg time: 33.177 ms.
[Work thread Jun 16 14:02] Timing 25 iterations of 7168K FFT length. Best time: 39.618 ms., avg time: 39.721 ms.
[Work thread Jun 16 14:02] Timing 25 iterations of 8192K FFT length. Best time: 46.036 ms., avg time: 46.128 ms.
[Work thread Jun 16 14:02] Timing FFTs using 2 threads on 1 physical CPU.
[Work thread Jun 16 14:02] Timing 36 iterations of 1024K FFT length. Best time: 4.547 ms., avg time: 4.571 ms.
[Work thread Jun 16 14:02] Timing 29 iterations of 1280K FFT length. Best time: 5.721 ms., avg time: 5.767 ms.
[Work thread Jun 16 14:02] Timing 25 iterations of 1536K FFT length. Best time: 6.995 ms., avg time: 7.050 ms.
[Work thread Jun 16 14:02] Timing 25 iterations of 1792K FFT length. Best time: 8.491 ms., avg time: 8.570 ms.
[Work thread Jun 16 14:02] Timing 25 iterations of 2048K FFT length. Best time: 10.148 ms., avg time: 10.372 ms.
[Work thread Jun 16 14:02] Timing 25 iterations of 2560K FFT length. Best time: 12.506 ms., avg time: 12.931 ms.
[Work thread Jun 16 14:02] Timing 25 iterations of 3072K FFT length. Best time: 15.535 ms., avg time: 15.826 ms.
[Work thread Jun 16 14:02] Timing 25 iterations of 3584K FFT length. Best time: 19.055 ms., avg time: 19.465 ms.
[Work thread Jun 16 14:03] Timing 25 iterations of 4096K FFT length. Best time: 21.606 ms., avg time: 22.175 ms.
[Work thread Jun 16 14:03] Timing 25 iterations of 5120K FFT length. Best time: 28.121 ms., avg time: 28.367 ms.
[Work thread Jun 16 14:03] Timing 25 iterations of 6144K FFT length. Best time: 36.025 ms., avg time: 36.362 ms.
[Work thread Jun 16 14:03] Timing 25 iterations of 7168K FFT length. Best time: 42.364 ms., avg time: 43.682 ms.
[Work thread Jun 16 14:03] Timing 25 iterations of 8192K FFT length. Best time: 49.345 ms., avg time: 50.905 ms.

A 4770 at stock clocks with 1600 MHz RAM:

[Work thread Jun 16 10:11] Timing 44 iterations of 1024K FFT length. Best time: 4.281 ms., avg time: 4.897 ms.
[Work thread Jun 16 10:11] Timing 35 iterations of 1280K FFT length. Best time: 5.537 ms., avg time: 5.642 ms.
[Work thread Jun 16 10:11] Timing 29 iterations of 1536K FFT length. Best time: 6.675 ms., avg time: 6.781 ms.
[Work thread Jun 16 10:11] Timing 25 iterations of 1792K FFT length. Best time: 8.151 ms., avg time: 8.184 ms.
[Work thread Jun 16 10:11] Timing 25 iterations of 2048K FFT length. Best time: 9.124 ms., avg time: 9.188 ms.
[Work thread Jun 16 10:11] Timing 25 iterations of 2560K FFT length. Best time: 11.693 ms., avg time: 15.343 ms.
[Work thread Jun 16 10:11] Timing 25 iterations of 3072K FFT length. Best time: 14.091 ms., avg time: 14.156 ms.
[Work thread Jun 16 10:11] Timing 25 iterations of 3584K FFT length. Best time: 16.791 ms., avg time: 17.066 ms.
[Work thread Jun 16 10:11] Timing 25 iterations of 4096K FFT length. Best time: 19.295 ms., avg time: 24.632 ms.
[Work thread Jun 16 10:11] Timing 25 iterations of 5120K FFT length. Best time: 24.642 ms., avg time: 24.689 ms.
[Work thread Jun 16 10:11] Timing 25 iterations of 6144K FFT length. Best time: 29.367 ms., avg time: 33.022 ms.
[Work thread Jun 16 10:11] Timing 25 iterations of 7168K FFT length. Best time: 34.989 ms., avg time: 37.109 ms.
[Work thread Jun 16 10:11] Timing 25 iterations of 8192K FFT length. Best time: 40.438 ms., avg time: 40.463 ms.
[Work thread Jun 16 10:11] Timing FFTs using 2 threads on 1 physical CPU.
[Work thread Jun 16 10:11] Timing 44 iterations of 1024K FFT length. Best time: 4.140 ms., avg time: 4.159 ms.
[Work thread Jun 16 10:11] Timing 35 iterations of 1280K FFT length. Best time: 5.422 ms., avg time: 5.643 ms.
[Work thread Jun 16 10:11] Timing 29 iterations of 1536K FFT length. Best time: 6.603 ms., avg time: 6.682 ms.
[Work thread Jun 16 10:11] Timing 25 iterations of 1792K FFT length. Best time: 7.751 ms., avg time: 7.832 ms.
[Work thread Jun 16 10:11] Timing 25 iterations of 2048K FFT length. Best time: 9.457 ms., avg time: 9.598 ms.
[Work thread Jun 16 10:11] Timing 25 iterations of 2560K FFT length. Best time: 11.483 ms., avg time: 11.732 ms.
[Work thread Jun 16 10:11] Timing 25 iterations of 3072K FFT length. Best time: 13.747 ms., avg time: 14.136 ms.
[Work thread Jun 16 10:11] Timing 25 iterations of 3584K FFT length. Best time: 16.652 ms., avg time: 17.001 ms.
[Work thread Jun 16 10:11] Timing 25 iterations of 4096K FFT length. Best time: 19.166 ms., avg time: 19.413 ms.
[Work thread Jun 16 10:11] Timing 25 iterations of 5120K FFT length. Best time: 24.249 ms., avg time: 24.386 ms.
[Work thread Jun 16 10:11] Timing 25 iterations of 6144K FFT length. Best time: 30.548 ms., avg time: 31.100 ms.
[Work thread Jun 16 10:11] Timing 25 iterations of 7168K FFT length. Best time: 36.463 ms., avg time: 36.643 ms.
[Work thread Jun 16 10:11] Timing 25 iterations of 8192K FFT length. Best time: 41.483 ms., avg time: 42.415 ms.

And a 4770k at stock clocks with 2400 MHz RAM:

[Work thread Jun 16 10:06] Timing 43 iterations of 1024K FFT length. Best time: 3.991 ms., avg time: 4.131 ms.
[Work thread Jun 16 10:06] Timing 34 iterations of 1280K FFT length. Best time: 5.160 ms., avg time: 5.196 ms.
[Work thread Jun 16 10:06] Timing 29 iterations of 1536K FFT length. Best time: 6.167 ms., avg time: 6.213 ms.
[Work thread Jun 16 10:06] Timing 25 iterations of 1792K FFT length. Best time: 7.617 ms., avg time: 7.806 ms.
[Work thread Jun 16 10:06] Timing 25 iterations of 2048K FFT length. Best time: 8.758 ms., avg time: 8.876 ms.
[Work thread Jun 16 10:06] Timing 25 iterations of 2560K FFT length. Best time: 10.741 ms., avg time: 10.937 ms.
[Work thread Jun 16 10:06] Timing 25 iterations of 3072K FFT length. Best time: 12.947 ms., avg time: 13.238 ms.
[Work thread Jun 16 10:06] Timing 25 iterations of 3584K FFT length. Best time: 15.444 ms., avg time: 15.773 ms.
[Work thread Jun 16 10:06] Timing 25 iterations of 4096K FFT length. Best time: 17.717 ms., avg time: 17.869 ms.
[Work thread Jun 16 10:06] Timing 25 iterations of 5120K FFT length. Best time: 22.808 ms., avg time: 23.032 ms.
[Work thread Jun 16 10:06] Timing 25 iterations of 6144K FFT length. Best time: 27.089 ms., avg time: 27.559 ms.
[Work thread Jun 16 10:06] Timing 25 iterations of 7168K FFT length. Best time: 31.962 ms., avg time: 32.342 ms.
[Work thread Jun 16 10:06] Timing 25 iterations of 8192K FFT length. Best time: 37.185 ms., avg time: 37.336 ms.
[Work thread Jun 16 10:06] Timing FFTs using 2 threads on 1 physical CPU.
[Work thread Jun 16 10:06] Timing 43 iterations of 1024K FFT length. Best time: 3.879 ms., avg time: 3.929 ms.
[Work thread Jun 16 10:06] Timing 34 iterations of 1280K FFT length. Best time: 4.962 ms., avg time: 5.006 ms.
[Work thread Jun 16 10:06] Timing 29 iterations of 1536K FFT length. Best time: 6.096 ms., avg time: 6.119 ms.
[Work thread Jun 16 10:06] Timing 25 iterations of 1792K FFT length. Best time: 7.345 ms., avg time: 7.377 ms.
[Work thread Jun 16 10:06] Timing 25 iterations of 2048K FFT length. Best time: 8.563 ms., avg time: 8.803 ms.
[Work thread Jun 16 10:06] Timing 25 iterations of 2560K FFT length. Best time: 10.408 ms., avg time: 10.574 ms.
[Work thread Jun 16 10:06] Timing 25 iterations of 3072K FFT length. Best time: 12.569 ms., avg time: 12.814 ms.
[Work thread Jun 16 10:06] Timing 25 iterations of 3584K FFT length. Best time: 15.082 ms., avg time: 15.255 ms.
[Work thread Jun 16 10:06] Timing 25 iterations of 4096K FFT length. Best time: 17.042 ms., avg time: 17.344 ms.
[Work thread Jun 16 10:06] Timing 25 iterations of 5120K FFT length. Best time: 22.032 ms., avg time: 22.206 ms.
[Work thread Jun 16 10:06] Timing 25 iterations of 6144K FFT length. Best time: 27.740 ms., avg time: 28.096 ms.
[Work thread Jun 16 10:06] Timing 25 iterations of 7168K FFT length. Best time: 32.233 ms., avg time: 32.927 ms.
[Work thread Jun 16 10:06] Timing 25 iterations of 8192K FFT length. Best time: 37.702 ms., avg time: 38.482 ms.

This doesn't say much that we didn't already know. The c3.large and c4.large have two hyperthreads of a single CPU core.

The E5-2666v3 in the c4.large has a turbo of 3.3 GHz, but /proc/cpuinfo shows it running at 2.9 GHz.

The E5-2680v2 in the c3.large has a turbo of 3.6 GHz, but /proc/cpuinfo shows 2.8 GHz, and it also lacks AVX2.

So a c4.large actually does pretty well for its clock speed with little virtualization overhead.
Mark Rose is offline   Reply With Quote
Old 2016-06-19, 04:02   #665
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

1011011100102 Posts
Default

Intel i3-4170 @ 3.7 GHz with DDR3-1333

[Work thread Jun 18 23:41] Timing 46 iterations of 1024K FFT length. Best time: 4.383 ms., avg time: 4.397 ms.
[Work thread Jun 18 23:41] Timing 36 iterations of 1280K FFT length. Best time: 5.575 ms., avg time: 5.584 ms.
[Work thread Jun 18 23:41] Timing 30 iterations of 1536K FFT length. Best time: 6.665 ms., avg time: 6.679 ms.
[Work thread Jun 18 23:41] Timing 26 iterations of 1792K FFT length. Best time: 8.349 ms., avg time: 8.662 ms.
[Work thread Jun 18 23:41] Timing 25 iterations of 2048K FFT length. Best time: 9.112 ms., avg time: 9.130 ms.
[Work thread Jun 18 23:41] Timing 25 iterations of 2560K FFT length. Best time: 11.567 ms., avg time: 11.588 ms.
[Work thread Jun 18 23:41] Timing 25 iterations of 3072K FFT length. Best time: 13.935 ms., avg time: 13.950 ms.
[Work thread Jun 18 23:41] Timing 25 iterations of 3584K FFT length. Best time: 16.531 ms., avg time: 16.549 ms.
[Work thread Jun 18 23:41] Timing 25 iterations of 4096K FFT length. Best time: 19.045 ms., avg time: 19.069 ms.
[Work thread Jun 18 23:41] Timing 25 iterations of 5120K FFT length. Best time: 24.178 ms., avg time: 24.200 ms.
[Work thread Jun 18 23:41] Timing 25 iterations of 6144K FFT length. Best time: 28.969 ms., avg time: 28.996 ms.
[Work thread Jun 18 23:41] Timing 25 iterations of 7168K FFT length. Best time: 34.644 ms., avg time: 34.673 ms.
[Work thread Jun 18 23:41] Timing 25 iterations of 8192K FFT length. Best time: 39.996 ms., avg time: 40.030 ms.
[Work thread Jun 18 23:41] Timing FFTs using 2 threads on 1 physical CPU.
[Work thread Jun 18 23:41] Timing 46 iterations of 1024K FFT length. Best time: 4.590 ms., avg time: 4.607 ms.
[Work thread Jun 18 23:41] Timing 36 iterations of 1280K FFT length. Best time: 5.864 ms., avg time: 5.898 ms.
[Work thread Jun 18 23:41] Timing 30 iterations of 1536K FFT length. Best time: 7.092 ms., avg time: 7.133 ms.
[Work thread Jun 18 23:41] Timing 26 iterations of 1792K FFT length. Best time: 8.087 ms., avg time: 8.183 ms.
[Work thread Jun 18 23:41] Timing 25 iterations of 2048K FFT length. Best time: 9.698 ms., avg time: 9.780 ms.
[Work thread Jun 18 23:41] Timing 25 iterations of 2560K FFT length. Best time: 11.710 ms., avg time: 11.870 ms.
[Work thread Jun 18 23:41] Timing 25 iterations of 3072K FFT length. Best time: 14.284 ms., avg time: 14.384 ms.
[Work thread Jun 18 23:41] Timing 25 iterations of 3584K FFT length. Best time: 17.137 ms., avg time: 17.236 ms.
[Work thread Jun 18 23:41] Timing 25 iterations of 4096K FFT length. Best time: 19.351 ms., avg time: 19.594 ms.
[Work thread Jun 18 23:41] Timing 25 iterations of 5120K FFT length. Best time: 24.615 ms., avg time: 24.733 ms.
[Work thread Jun 18 23:41] Timing 25 iterations of 6144K FFT length. Best time: 30.627 ms., avg time: 30.673 ms.
[Work thread Jun 18 23:42] Timing 25 iterations of 7168K FFT length. Best time: 36.243 ms., avg time: 36.295 ms.
[Work thread Jun 18 23:42] Timing 25 iterations of 8192K FFT length. Best time: 42.296 ms., avg time: 42.369 ms.
[Work thread Jun 18 23:42] Timing FFTs using 2 threads on 2 physical CPUs.
[Work thread Jun 18 23:42] Timing 46 iterations of 1024K FFT length. Best time: 2.774 ms., avg time: 2.800 ms.
[Work thread Jun 18 23:42] Timing 36 iterations of 1280K FFT length. Best time: 3.636 ms., avg time: 3.658 ms.
[Work thread Jun 18 23:42] Timing 30 iterations of 1536K FFT length. Best time: 3.753 ms., avg time: 3.815 ms.
[Work thread Jun 18 23:42] Timing 26 iterations of 1792K FFT length. Best time: 4.565 ms., avg time: 4.680 ms.
[Work thread Jun 18 23:42] Timing 25 iterations of 2048K FFT length. Best time: 5.184 ms., avg time: 5.226 ms.
[Work thread Jun 18 23:42] Timing 25 iterations of 2560K FFT length. Best time: 6.801 ms., avg time: 6.818 ms.
[Work thread Jun 18 23:42] Timing 25 iterations of 3072K FFT length. Best time: 8.201 ms., avg time: 8.223 ms.
[Work thread Jun 18 23:42] Timing 25 iterations of 3584K FFT length. Best time: 9.548 ms., avg time: 9.641 ms.
[Work thread Jun 18 23:42] Timing 25 iterations of 4096K FFT length. Best time: 10.746 ms., avg time: 10.757 ms.
[Work thread Jun 18 23:42] Timing 25 iterations of 5120K FFT length. Best time: 13.467 ms., avg time: 13.509 ms.
[Work thread Jun 18 23:42] Timing 25 iterations of 6144K FFT length. Best time: 16.185 ms., avg time: 16.287 ms.
[Work thread Jun 18 23:42] Timing 25 iterations of 7168K FFT length. Best time: 19.323 ms., avg time: 19.439 ms.
[Work thread Jun 18 23:42] Timing 25 iterations of 8192K FFT length. Best time: 22.326 ms., avg time: 22.443 ms.
[Work thread Jun 18 23:42] Timing FFTs using 4 threads on 2 physical CPUs.
[Work thread Jun 18 23:42] Timing 46 iterations of 1024K FFT length. Best time: 2.793 ms., avg time: 2.822 ms.
[Work thread Jun 18 23:42] Timing 36 iterations of 1280K FFT length. Best time: 3.612 ms., avg time: 3.633 ms.
[Work thread Jun 18 23:42] Timing 30 iterations of 1536K FFT length. Best time: 4.079 ms., avg time: 4.153 ms.
[Work thread Jun 18 23:42] Timing 26 iterations of 1792K FFT length. Best time: 4.796 ms., avg time: 4.827 ms.
[Work thread Jun 18 23:42] Timing 25 iterations of 2048K FFT length. Best time: 5.408 ms., avg time: 5.457 ms.
[Work thread Jun 18 23:42] Timing 25 iterations of 2560K FFT length. Best time: 6.644 ms., avg time: 6.709 ms.
[Work thread Jun 18 23:42] Timing 25 iterations of 3072K FFT length. Best time: 8.018 ms., avg time: 8.104 ms.
[Work thread Jun 18 23:42] Timing 25 iterations of 3584K FFT length. Best time: 9.744 ms., avg time: 9.861 ms.
[Work thread Jun 18 23:42] Timing 25 iterations of 4096K FFT length. Best time: 10.767 ms., avg time: 10.864 ms.
[Work thread Jun 18 23:42] Timing 25 iterations of 5120K FFT length. Best time: 13.628 ms., avg time: 13.740 ms.
[Work thread Jun 18 23:42] Timing 25 iterations of 6144K FFT length. Best time: 17.180 ms., avg time: 17.329 ms.
[Work thread Jun 18 23:42] Timing 25 iterations of 7168K FFT length. Best time: 19.999 ms., avg time: 20.658 ms.
[Work thread Jun 18 23:42] Timing 25 iterations of 8192K FFT length. Best time: 23.422 ms., avg time: 23.981 ms.

Perhaps not so interesting given that it's older hardware, but good to note that the dual cores are not impacted by DDR3-1333 at all. I thought I had been memory bound before, but I was wrong.
Mark Rose is offline   Reply With Quote
Old 2016-06-22, 16:31   #666
bmurray7JHU
 
Jun 2016

1 Posts
Default Intel i7-6950X

My employer let me build a new workstation with a i7-6950X

Specs
  • i7-6950X with a conservative overclock to 3.9 Ghz.
  • 128GB (8 * 16GB) of DDR4-2400 RAM
  • Fedora 22


Code:
[Wed Jun 22 09:13:23 2016]
Compare your results to other computers at http://www.mersenne.org/report_benchmarks
Intel(R) Core(TM) i7-6950X CPU @ 3.00GHz
CPU speed: 1864.58 MHz, 10 cores
CPU features: Prefetchw, SSE, SSE2, SSE4, AVX, AVX2, FMA
L1 cache size: 32 KB
L2 cache size: 256 KB, L3 cache size: 25 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Prime95 64-bit version 28.9, RdtscTiming=1
Best time for 1024K FFT length: 5.010 ms., avg: 5.024 ms.
Best time for 1280K FFT length: 4.859 ms., avg: 4.911 ms.
Best time for 1536K FFT length: 5.884 ms., avg: 5.920 ms.
Best time for 1792K FFT length: 7.196 ms., avg: 7.249 ms.
Best time for 2048K FFT length: 8.002 ms., avg: 8.054 ms.
Best time for 2560K FFT length: 10.269 ms., avg: 10.512 ms.
Best time for 3072K FFT length: 12.585 ms., avg: 12.743 ms.
Best time for 3584K FFT length: 15.504 ms., avg: 15.831 ms.
Best time for 4096K FFT length: 18.191 ms., avg: 18.324 ms.
Best time for 5120K FFT length: 23.629 ms., avg: 24.516 ms.
Best time for 6144K FFT length: 29.791 ms., avg: 30.115 ms.
Best time for 7168K FFT length: 35.978 ms., avg: 36.637 ms.
Best time for 8192K FFT length: 41.911 ms., avg: 48.057 ms.
Timing FFTs using 2 threads.
Best time for 1024K FFT length: 2.601 ms., avg: 2.809 ms.
Best time for 1280K FFT length: 2.744 ms., avg: 2.945 ms.
Best time for 1536K FFT length: 3.077 ms., avg: 3.157 ms.
Best time for 1792K FFT length: 3.834 ms., avg: 4.026 ms.
Best time for 2048K FFT length: 5.140 ms., avg: 5.173 ms.
Best time for 2560K FFT length: 6.094 ms., avg: 6.924 ms.
Best time for 3072K FFT length: 7.553 ms., avg: 8.138 ms.
Best time for 3584K FFT length: 9.036 ms., avg: 9.947 ms.
Best time for 4096K FFT length: 10.487 ms., avg: 11.013 ms.
Best time for 5120K FFT length: 13.678 ms., avg: 14.730 ms.
Best time for 6144K FFT length: 16.925 ms., avg: 17.189 ms.
Best time for 7168K FFT length: 18.428 ms., avg: 19.795 ms.
Best time for 8192K FFT length: 23.605 ms., avg: 24.299 ms.
Timing FFTs using 3 threads.
Best time for 1024K FFT length: 1.659 ms., avg: 1.742 ms.
Best time for 1280K FFT length: 2.581 ms., avg: 2.834 ms.
Best time for 1536K FFT length: 3.040 ms., avg: 3.487 ms.
Best time for 1792K FFT length: 3.332 ms., avg: 3.802 ms.
Best time for 2048K FFT length: 3.939 ms., avg: 4.596 ms.
Best time for 2560K FFT length: 4.491 ms., avg: 5.520 ms.
Best time for 3072K FFT length: 5.225 ms., avg: 6.449 ms.
Best time for 3584K FFT length: 6.344 ms., avg: 7.177 ms.
Best time for 4096K FFT length: 7.342 ms., avg: 8.583 ms.
Best time for 5120K FFT length: 9.497 ms., avg: 10.556 ms.
Best time for 6144K FFT length: 11.696 ms., avg: 12.928 ms.
Best time for 7168K FFT length: 14.071 ms., avg: 15.242 ms.
Best time for 8192K FFT length: 14.720 ms., avg: 15.956 ms.
Timing FFTs using 4 threads.
Best time for 1024K FFT length: 1.183 ms., avg: 1.262 ms.
Best time for 1280K FFT length: 1.818 ms., avg: 2.043 ms.
Best time for 1536K FFT length: 2.346 ms., avg: 2.580 ms.
Best time for 1792K FFT length: 2.873 ms., avg: 3.277 ms.
Best time for 2048K FFT length: 3.189 ms., avg: 3.702 ms.
Best time for 2560K FFT length: 3.482 ms., avg: 4.478 ms.
Best time for 3072K FFT length: 4.980 ms., avg: 6.118 ms.
Best time for 3584K FFT length: 4.863 ms., avg: 5.857 ms.
Best time for 4096K FFT length: 6.378 ms., avg: 7.342 ms.
Best time for 5120K FFT length: 7.293 ms., avg: 8.274 ms.
Best time for 6144K FFT length: 9.018 ms., avg: 10.252 ms.
Best time for 7168K FFT length: 10.839 ms., avg: 12.003 ms.
Best time for 8192K FFT length: 12.682 ms., avg: 13.563 ms.
Timing FFTs using 5 threads.
Best time for 1024K FFT length: 1.366 ms., avg: 1.476 ms.
Best time for 1280K FFT length: 1.676 ms., avg: 1.832 ms.
Best time for 1536K FFT length: 2.203 ms., avg: 2.537 ms.
Best time for 1792K FFT length: 2.633 ms., avg: 3.061 ms.
Best time for 2048K FFT length: 2.712 ms., avg: 3.206 ms.
Best time for 2560K FFT length: 3.296 ms., avg: 3.991 ms.
Best time for 3072K FFT length: 3.696 ms., avg: 4.553 ms.
Best time for 3584K FFT length: 4.296 ms., avg: 5.309 ms.
Best time for 4096K FFT length: 4.840 ms., avg: 5.910 ms.
Best time for 5120K FFT length: 5.958 ms., avg: 7.189 ms.
Best time for 6144K FFT length: 7.988 ms., avg: 8.980 ms.
Best time for 7168K FFT length: 8.853 ms., avg: 10.024 ms.
Best time for 8192K FFT length: 10.258 ms., avg: 11.330 ms.
Timing FFTs using 6 threads.
Best time for 1024K FFT length: 1.361 ms., avg: 1.501 ms.
Best time for 1280K FFT length: 1.843 ms., avg: 1.988 ms.
Best time for 1536K FFT length: 1.714 ms., avg: 1.874 ms.
Best time for 1792K FFT length: 2.357 ms., avg: 2.700 ms.
Best time for 2048K FFT length: 2.444 ms., avg: 2.887 ms.
Best time for 2560K FFT length: 2.870 ms., avg: 3.454 ms.
Best time for 3072K FFT length: 3.190 ms., avg: 4.154 ms.
Best time for 3584K FFT length: 3.940 ms., avg: 4.873 ms.
Best time for 4096K FFT length: 4.169 ms., avg: 5.256 ms.
Best time for 5120K FFT length: 5.218 ms., avg: 6.543 ms.
Best time for 6144K FFT length: 6.240 ms., avg: 7.421 ms.
Best time for 7168K FFT length: 7.545 ms., avg: 8.910 ms.
Best time for 8192K FFT length: 8.722 ms., avg: 9.837 ms.
Timing FFTs using 7 threads.
Best time for 1024K FFT length: 1.156 ms., avg: 1.308 ms.
Best time for 1280K FFT length: 1.651 ms., avg: 1.858 ms.
Best time for 1536K FFT length: 1.576 ms., avg: 1.824 ms.
Best time for 1792K FFT length: 2.075 ms., avg: 2.591 ms.
Best time for 2048K FFT length: 1.841 ms., avg: 2.460 ms.
Best time for 2560K FFT length: 2.533 ms., avg: 3.028 ms.
Best time for 3072K FFT length: 2.907 ms., avg: 3.624 ms.
Best time for 3584K FFT length: 3.522 ms., avg: 4.378 ms.
Best time for 4096K FFT length: 3.664 ms., avg: 4.673 ms.
Best time for 5120K FFT length: 4.722 ms., avg: 5.834 ms.
Best time for 6144K FFT length: 5.536 ms., avg: 6.690 ms.
Best time for 7168K FFT length: 6.521 ms., avg: 7.729 ms.
Best time for 8192K FFT length: 7.587 ms., avg: 9.024 ms.
Timing FFTs using 8 threads.
Best time for 1024K FFT length: 1.048 ms., avg: 1.120 ms.
Best time for 1280K FFT length: 1.345 ms., avg: 1.469 ms.
Best time for 1536K FFT length: 1.618 ms., avg: 1.846 ms.
Best time for 1792K FFT length: 1.929 ms., avg: 2.233 ms.
Best time for 2048K FFT length: 2.006 ms., avg: 2.307 ms.
Best time for 2560K FFT length: 2.455 ms., avg: 3.002 ms.
Best time for 3072K FFT length: 2.742 ms., avg: 3.401 ms.
Best time for 3584K FFT length: 3.168 ms., avg: 3.952 ms.
Best time for 4096K FFT length: 3.323 ms., avg: 4.260 ms.
Best time for 5120K FFT length: 4.069 ms., avg: 5.349 ms.
Best time for 6144K FFT length: 4.934 ms., avg: 6.162 ms.
Best time for 7168K FFT length: 5.784 ms., avg: 7.071 ms.
Best time for 8192K FFT length: 6.815 ms., avg: 7.912 ms.
Timing FFTs using 9 threads.
Best time for 1024K FFT length: 0.806 ms., avg: 0.876 ms.
Best time for 1280K FFT length: 1.271 ms., avg: 1.470 ms.
Best time for 1536K FFT length: 1.476 ms., avg: 1.732 ms.
Best time for 1792K FFT length: 1.839 ms., avg: 2.096 ms.
Best time for 2048K FFT length: 1.833 ms., avg: 2.100 ms.
Best time for 2560K FFT length: 2.383 ms., avg: 2.788 ms.
Best time for 3072K FFT length: 2.541 ms., avg: 3.026 ms.
Best time for 3584K FFT length: 3.094 ms., avg: 3.701 ms.
Best time for 4096K FFT length: 3.232 ms., avg: 4.024 ms.
Best time for 5120K FFT length: 3.966 ms., avg: 5.013 ms.
Best time for 6144K FFT length: 4.804 ms., avg: 5.902 ms.
Best time for 7168K FFT length: 5.403 ms., avg: 6.469 ms.
Best time for 8192K FFT length: 6.427 ms., avg: 7.618 ms.
Timing FFTs using 10 threads.
Best time for 1024K FFT length: 0.956 ms., avg: 1.025 ms.
Best time for 1280K FFT length: 1.244 ms., avg: 1.319 ms.
Best time for 1536K FFT length: 1.426 ms., avg: 1.643 ms.
Best time for 1792K FFT length: 1.554 ms., avg: 1.767 ms.
Best time for 2048K FFT length: 1.801 ms., avg: 2.402 ms.
Best time for 2560K FFT length: 2.081 ms., avg: 2.311 ms.
Best time for 3072K FFT length: 2.406 ms., avg: 2.933 ms.
Best time for 3584K FFT length: 2.542 ms., avg: 2.899 ms.
Best time for 4096K FFT length: 3.061 ms., avg: 3.676 ms.
Best time for 5120K FFT length: 3.612 ms., avg: 4.502 ms.
Best time for 6144K FFT length: 4.422 ms., avg: 5.500 ms.
Best time for 7168K FFT length: 5.148 ms., avg: 6.105 ms.
Best time for 8192K FFT length: 6.113 ms., avg: 7.234 ms.

Timings for 1024K FFT length (10 cpus, 10 workers):  6.57,  6.54,  6.70,  6.50,  6.75,  6.76,  6.75,  6.61,  6.61,  6.59 ms.  Throughput: 1506.84 iter/sec.
Timings for 1280K FFT length (10 cpus, 10 workers):  8.63,  8.31,  8.35,  8.34,  8.36,  8.65,  8.51,  8.30,  8.31,  8.35 ms.  Throughput: 1189.23 iter/sec.
Timings for 1536K FFT length (10 cpus, 10 workers):  9.95,  9.79,  9.96,  9.89,  9.66, 10.06, 10.01, 10.24,  9.76,  9.85 ms.  Throughput: 1008.81 iter/sec.
Timings for 1792K FFT length (10 cpus, 10 workers): 11.58, 11.51, 11.50, 12.07, 11.56, 11.75, 11.60, 11.51, 11.65, 11.53 ms.  Throughput: 860.38 iter/sec.
Timings for 2048K FFT length (10 cpus, 10 workers): 13.58, 13.28, 13.90, 13.26, 13.26, 13.89, 13.39, 13.17, 13.26, 13.39 ms.  Throughput: 744.47 iter/sec.
Timings for 2560K FFT length (10 cpus, 10 workers): 17.71, 16.41, 16.46, 16.45, 16.42, 16.98, 16.65, 16.40, 16.39, 16.45 ms.  Throughput: 601.54 iter/sec.
Timings for 3072K FFT length (10 cpus, 10 workers): 19.92, 19.62, 19.56, 19.66, 19.63, 21.97, 19.86, 20.04, 19.73, 19.67 ms.  Throughput: 501.40 iter/sec.
[Wed Jun 22 09:18:27 2016]
Timings for 3584K FFT length (10 cpus, 10 workers): 23.27, 23.41, 23.01, 23.16, 23.10, 24.97, 23.76, 23.22, 22.94, 23.15 ms.  Throughput: 427.62 iter/sec.
Timings for 4096K FFT length (10 cpus, 10 workers): 26.31, 26.31, 26.21, 26.42, 26.62, 27.58, 28.83, 26.46, 26.21, 26.89 ms.  Throughput: 373.65 iter/sec.
Timings for 5120K FFT length (10 cpus, 10 workers): 33.71, 33.73, 33.31, 33.16, 33.56, 36.04, 34.43, 34.69, 33.84, 33.69 ms.  Throughput: 294.13 iter/sec.
Timings for 6144K FFT length (10 cpus, 10 workers): 41.91, 41.32, 41.41, 41.41, 41.45, 43.62, 42.29, 42.35, 41.66, 41.52 ms.  Throughput: 238.75 iter/sec.
Timings for 7168K FFT length (10 cpus, 10 workers): 48.99, 49.07, 48.48, 50.00, 48.35, 50.35, 49.62, 48.42, 49.54, 49.20 ms.  Throughput: 203.28 iter/sec.
Timings for 8192K FFT length (10 cpus, 10 workers): 56.28, 55.49, 57.82, 55.58, 55.99, 57.60, 56.53, 55.49, 55.58, 57.00 ms.  Throughput: 177.54 iter/sec.
When I get a chance, I run the benchmark at the stock clock speeds and post the results.
bmurray7JHU is offline   Reply With Quote
Old 2016-06-22, 22:44   #667
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

3,313 Posts
Default Broadwell vs Haswell (dual 14-core, same clock)

Well, the new server arrived today, earlier than I expected.

I'm still installing the OS and what not but I had time to squeeze in a quick test.

This is a dual 14-core E5-2690v4 and I'm comparing it to a dual 14-core E5-2697v3. They are very similar CPUs... both clocked at 2.6 GHz, both 14-core.

And, as it turns out, both are using DDR4-2100. See, the server came with a certain amount of RAM (64GB) and I added another 192GB, which results in 2DPC on both CPUs. (16GB modules x a total of 16).

Well... imagine my surprise when I looked at the lights-out instrumentation and saw it was only running at 2133 MHz even though it recognizes all of the DIMMs are 2400 MHz. Turns out, after digging in the quickspecs, that it only runs @ 2400 with 2DPC if you're using dual-ranked 16GB modules, and I have a bunch of single-ranked sticks. Oh well... that's what it came with also, so if I wanted dual ranks I would have had to get 4 more to replace the factory installed stuff (or spent more anyway on a custom order instead of the smart buy).

Eh, it's kind of a nuisance but it's not really a big deal. And I got to compare the v3 and v4 CPUs even more directly, getting down to just what the CPU itself is bringing to the table.

Turns out, quite a lot.

I took the work from the v3 box and put it on the v4, started both up and took a screenshot side-by-side so you can see the difference. v4 on the left, v3 on the right. (Notice that Prime95 28.9 doesn't identify the new chip yet).

Oh, I'll probably tweak the system and either remove the 2nd DIMM per channel or set it to mirroring mode (that may or may not run them at the equivalent of 1DPC ?) just to do another test @ 2400 to see how much more that helps.
Attached Thumbnails
Click image for larger version

Name:	v4 versus v3.jpg
Views:	127
Size:	440.5 KB
ID:	14555  
Madpoo is offline   Reply With Quote
Old 2016-06-22, 23:29   #668
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

3,313 Posts
Default

Quote:
Originally Posted by Madpoo View Post
Oh, I'll probably tweak the system and either remove the 2nd DIMM per channel or set it to mirroring mode (that may or may not run them at the equivalent of 1DPC ?) just to do another test @ 2400 to see how much more that helps.
I didn't think simply setting the memory to spare/mirrored would make it run @ 2400 so I just removed the 2nd dimm per channel. Here's the same view but with the v4 CPU running DDR4-2400. Really not as dramatic a change.
Attached Thumbnails
Click image for larger version

Name:	v4 at 2400 versus v3.jpg
Views:	105
Size:	432.8 KB
ID:	14556  
Madpoo is offline   Reply With Quote
Old 2016-06-22, 23:40   #669
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

19×397 Posts
Default

Quote:
Originally Posted by Madpoo View Post
(Notice that Prime95 28.9 doesn't identify the new chip yet).
Please do an Advanced/Time 9950 and post (or email) the results. Thanks.
Prime95 is offline   Reply With Quote
Old 2016-06-23, 03:12   #670
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

1100111100012 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Please do an Advanced/Time 9950 and post (or email) the results. Thanks.
Hmm... done. Not sure what that all is, I guess the contents of some stuff to ID the CPU specifics.

Code:
[Thu Jun 23 03:09:04 2016]
i: 00000000, EAX: 0000000D, EBX: 756E6547, ECX: 6C65746E, EDX: 49656E69
i: 00000001, EAX: 000406F1, EBX: 22200800, ECX: FEF87383, EDX: BFCBFBFF
i: 00000002, EAX: 76036301, EBX: 00F0B5FF, ECX: 00000000, EDX: 00C30000
i: 00000003, EAX: 00000000, EBX: 00000000, ECX: 00000000, EDX: 00000000
i: 00000004, EAX: 3C004121, EBX: 01C0003F, ECX: 0000003F, EDX: 00000000
i: 00000004, EAX: 3C004122, EBX: 01C0003F, ECX: 0000003F, EDX: 00000000
i: 00000004, EAX: 3C004143, EBX: 01C0003F, ECX: 000001FF, EDX: 00000000
i: 00000004, EAX: 3C07C163, EBX: 04C0003F, ECX: 00006FFF, EDX: 00000002
i: 00000004, EAX: 3C004000, EBX: 00000000, ECX: 00000000, EDX: 00000000
i: 00000005, EAX: 00000000, EBX: 00000000, ECX: 00000000, EDX: 00000000
i: 00000006, EAX: 00000077, EBX: 00000002, ECX: 00000001, EDX: 00000000
i: 00000007, EAX: 00000000, EBX: 00002BB9, ECX: 00000000, EDX: 00000000
i: 00000007, EAX: 00000000, EBX: 00000000, ECX: 00000000, EDX: 00000000
i: 00000007, EAX: 00000000, EBX: 00000000, ECX: 00000000, EDX: 00000000
i: 00000007, EAX: 00000000, EBX: 00000000, ECX: 00000000, EDX: 00000000
i: 00000007, EAX: 00000000, EBX: 00000000, ECX: 00000000, EDX: 00000000
i: 00000008, EAX: 00000000, EBX: 00000000, ECX: 00000000, EDX: 00000000
i: 00000009, EAX: 00000000, EBX: 00000000, ECX: 00000000, EDX: 00000000
i: 0000000A, EAX: 00000000, EBX: 00000000, ECX: 00000000, EDX: 00000000
i: 0000000B, EAX: 00000001, EBX: 00000002, ECX: 00000100, EDX: 00000022
i: 0000000B, EAX: 00000005, EBX: 0000001C, ECX: 00000201, EDX: 00000022
i: 0000000B, EAX: 00000000, EBX: 00000000, ECX: 00000002, EDX: 00000022
i: 0000000B, EAX: 00000000, EBX: 00000000, ECX: 00000003, EDX: 00000022
i: 0000000B, EAX: 00000000, EBX: 00000000, ECX: 00000004, EDX: 00000022
i: 0000000C, EAX: 00000000, EBX: 00000000, ECX: 00000000, EDX: 00000000
i: 0000000D, EAX: 00000007, EBX: 00000340, ECX: 00000340, EDX: 00000000
i: 80000000, EAX: 80000008, EBX: 00000000, ECX: 00000000, EDX: 00000000
i: 80000001, EAX: 00000000, EBX: 00000000, ECX: 00000121, EDX: 28100800
i: 80000002, EAX: 65746E49, EBX: 2952286C, ECX: 6F655820, EDX: 2952286E
i: 80000003, EAX: 55504320, EBX: 2D354520, ECX: 30393632, EDX: 20347620
i: 80000004, EAX: 2E322040, EBX: 48473036, ECX: 0000007A, EDX: 00000000
i: 80000005, EAX: 00000000, EBX: 00000000, ECX: 00000000, EDX: 00000000
i: 80000006, EAX: 00000000, EBX: 00000000, ECX: 01006040, EDX: 00000000
i: 80000007, EAX: 00000000, EBX: 00000000, ECX: 00000000, EDX: 00000100
i: 80000008, EAX: 0000302E, EBX: 00000000, ECX: 00000000, EDX: 00000000
Madpoo is offline   Reply With Quote
Old 2016-06-23, 04:50   #671
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

3,313 Posts
Default

Quote:
Originally Posted by Madpoo View Post
I didn't think simply setting the memory to spare/mirrored would make it run @ 2400 so I just removed the 2nd dimm per channel. Here's the same view but with the v4 CPU running DDR4-2400. Really not as dramatic a change.
Well, I did some benchmarks with it tonight and it's faster than the previous gen CPU for sure.

I was a little puzzled by the output of the benchmark throughput though... it was saying that the max throughput was with 28 threads/4 workers (as opposed to the 28 threads/2 workers that I would have picked as optimal).

Turns out I was right, not the benchmark. When I actually set it up that way, with 4 workers of 7 cores each, it did pretty bad when all 4 were running. If I stopped all but one worker, that remaining worker sure took off and did well as expected, so it's definitely a case of interference with multiple workers on the same CPU; not enough mem bandwidth to go around. And that was with "only" 2400K FFT sized work.

So, I'm sticking with what I can demonstrate is the ideal working setup... 1 worker using all cores on each CPU.

After an overnight burn-in, since it's a new machine and I want to torture test it, I'll see how it does with some 100M digit stuff by way of comparison, or maybe see how it does with M49 since I can look back at my confirmation run on that to see how long that took on the previous gen CPUs.
Madpoo is offline   Reply With Quote
Old 2016-06-24, 12:09   #672
xtreme2k
 
xtreme2k's Avatar
 
Aug 2002

2·3·29 Posts
Default

Would you post the 1w/14t throughput please for the 2690V4?
xtreme2k is offline   Reply With Quote
Old 2016-06-24, 17:58   #673
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

3,313 Posts
Default

Quote:
Originally Posted by xtreme2k View Post
Would you post the 1w/14t throughput please for the 2690V4?
Yeah, although I don't have any confidence in the benchmark timings.

Code:
FFTlen=4096K, Type=0, Arch=0, Pass1=4194304, Pass2=0, clm=0 (14 cpus, 1 worker):  2.22 ms.  Throughput: 451.28 iter/sec.
FFTlen=4096K, Type=0, Arch=0, Pass1=4194304, Pass2=0, clm=0 (14 cpus, 2 workers):  3.62,  3.59 ms.  Throughput: 554.82 iter/sec.
FFTlen=4096K, Type=0, Arch=0, Pass1=4194304, Pass2=0, clm=0 (14 cpus, 7 workers): 15.66, 15.37, 15.66, 18.62, 14.13, 14.49, 14.09 ms.  Throughput: 457.31 iter/sec.
FFTlen=4096K, Type=0, Arch=0, Pass1=4194304, Pass2=0, clm=0 (14 cpus, 14 workers): 28.98, 29.16, 29.45, 28.96, 28.89, 28.92, 29.07, 28.96, 29.27, 29.52, 28.93, 28.95, 29.20, 29.48 ms.  Throughput: 480.73 iter/sec.
You'll see that it's saying the max throughput is from 2 workers of 7 threads each, but in reality that was not the case. I set it up like that and the actual iteration times were more than twice as long as with 1 worker using all 14 threads.

FYI, I focused on just the 4M FFT size since that's around the current leading edge of first-time LL tests.

Similarly, with all 28 cores (it's a dual CPU system after all), the benchmark has the same story to tell... works best with workers of 7 cores each, but again that was not the case in practice.

Code:
FFTlen=4096K, Type=0, Arch=0, Pass1=4194304, Pass2=0, clm=0 (28 cpus, 1 worker):  2.42 ms.  Throughput: 413.78 iter/sec.
FFTlen=4096K, Type=0, Arch=0, Pass1=4194304, Pass2=0, clm=0 (28 cpus, 2 workers):  5.56,  5.44 ms.  Throughput: 363.66 iter/sec.
FFTlen=4096K, Type=0, Arch=0, Pass1=4194304, Pass2=0, clm=0 (28 cpus, 4 workers):  5.69,  5.26,  5.51,  6.02 ms.  Throughput: 713.61 iter/sec.
FFTlen=4096K, Type=0, Arch=0, Pass1=4194304, Pass2=0, clm=0 (28 cpus, 7 workers): 14.08, 15.61, 12.16, 15.31, 13.06, 16.80, 11.98 ms.  Throughput: 502.18 iter/sec.
FFTlen=4096K, Type=0, Arch=0, Pass1=4194304, Pass2=0, clm=0 (28 cpus, 14 workers): 28.38, 29.96, 27.59, 29.77, 19.85, 21.59, 20.19, 28.38, 27.04, 28.78, 29.24, 20.06, 21.79, 19.90 ms.  Throughput: 571.84 iter/sec.
FFTlen=4096K, Type=0, Arch=0, Pass1=4194304, Pass2=0, clm=0 (28 cpus, 28 workers): 43.21, 43.40, 42.99, 43.45, 44.02, 43.51, 43.15, 45.82, 45.41, 45.46, 45.58, 46.21, 45.32, 44.86, 43.86, 45.31, 44.95, 43.48, 44.06, 43.94, 44.85, 45.71, 45.13, 45.53, 45.68, 45.36, 45.33, 45.59 ms.  Throughput: 626.90 iter/sec.
I don't know if the Prime95 benchmark timings are being done in a totally different way than Prime95 itself, but it seems to be missing out on the effects of memory contention when it's actually running. I'd just caution anyone who relies on the benchmark output to setup their system that they should also look at the actual iteration times with the different # of cores/workers and see what works best in reality.
Madpoo is offline   Reply With Quote
Old 2016-06-24, 18:15   #674
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

7·292 Posts
Default

It is worth noting that the second cpu didn't add that much compared with the first.
As I mentioned in the other thread the cpus are split into two NUMA nodes each. This could explain the 7 core behaviour.
henryzz is online now   Reply With Quote
Old 2016-06-24, 20:40   #675
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

19×397 Posts
Default

Quote:
Originally Posted by Madpoo View Post

Code:
FFTlen=4096K, Type=0, Arch=0, Pass1=4194304, Pass2=0, clm=0 (14 cpus, 1 worker):  2.22 ms.
What are your benchmark settings in prime.txt? There is no type=0, pass1=4M FFT! Probably just an output bug.
Prime95 is offline   Reply With Quote
Old 2016-06-26, 02:48   #676
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

3,313 Posts
Default

Quote:
Originally Posted by Prime95 View Post
What are your benchmark settings in prime.txt? There is no type=0, pass1=4M FFT! Probably just an output bug.
Oh, I was using the AllBench=1 or whatever option and I think that's where the funny output came from. I used min/max FFT of 4096K so it didn't really do anything else too interesting.

I just used the AllBench=1 because the undoc said "This is only useful during the development cycle to find the optimal FFT implementations for each CPU."

I thought it might do something useful, but it only did the tests twice with no difference in the timings.

I do wonder if the benchmark isn't using the same affinity map I'm telling the program to use. For instance, with the two CPUs running as "2 workers, 14 cores each using all cores on a single CPU", I can run two tests side by side, no slowdown whatsoever compared to running one worker with 14 cores and the other CPU is idle.

However, that's not what the benchmark results indicate... I would expect double the throughput going from 14-cpus, 1 worker up to 28-cores 2 workers. Didn't work out that way though. Only went from 451.28 iter/sec to 363.66 iter/sec (actually went down... that ain't right).

Thus my recommendation that if you want real data, do real tests for now.
Madpoo is offline   Reply With Quote
Old 2016-06-27, 09:13   #677
xtreme2k
 
xtreme2k's Avatar
 
Aug 2002

2·3·29 Posts
Default

Madpoo
On your post #667 and #668 you are indicating the 2690V4 is much faster than the 2697V3 however in all your other posts it is not the case?

Is there a way to ensure 1w/14t run on the CPU1, and the 2nd 1w/14t on CPU2?

Can you install HWinfo64 to see what the MHz the CPU is actually running at on a per core level. CPU-Z only shows the first core.

It is also interesting to see bmurray7JHU's 6950X results, as P95 v28.9 actually recognises his Broadwell-E.
xtreme2k is offline   Reply With Quote
Old 2016-06-27, 21:38   #678
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

3,313 Posts
Default

Quote:
Originally Posted by xtreme2k View Post
Madpoo
On your post #667 and #668 you are indicating the 2690V4 is much faster than the 2697V3 however in all your other posts it is not the case?
...
That's because both are true.

It's faster for smaller FFT sizes, but it's slower for larger FFT sizes, for whatever reason.

I saw the v4 would typically run at about 1x turbo multiplier faster compared to the v3, plus the faster DDR4 speed, which makes it even stranger that it does worse with larger exponents. Hopefully it's just a software thing, with Prime95 doing something "interesting" since it doesn't quite know what kind of CPU that is, or needs some tuning to optimize?

Since I'm not doing 100M digit tests, right now it's not bugging me too much. It runs faster with the current LL and DC wavefronts (call it ~37M and ~68M). I do wonder if it could actually be even faster with some tuning, but whatever... that'll come, if possible.

Last fiddled with by Madpoo on 2016-06-27 at 22:02
Madpoo is offline   Reply With Quote
Old 2016-06-27, 22:12   #679
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

3,313 Posts
Default

Quote:
Originally Posted by xtreme2k View Post
Is there a way to ensure 1w/14t run on the CPU1, and the 2nd 1w/14t on CPU2?
Yup, I'm sure about that. I verified from the Sysinternals "CoreInfo" that it's still following the Windows scheme of mapping core 0+1 as the physical/hyperthread cores, just like before. And then using the same AffinityScramble2 mapping that the 14-core v3 chip is using. And finally just looking at task manager with one graph per CPU (56 separate graphs show up) that if I start worker #1, the correct cores are all chugging along at 100%, and same if I only run worker #2.

Quote:
Can you install HWinfo64 to see what the MHz the CPU is actually running at on a per core level. CPU-Z only shows the first core.
Did that and confirmed that it's running at 29x multiplier whether I'm doing a 68M exponent or a 332M exponent. And also confirmed that without any tests going on, they're static at 32x which matches the spec (26x stock plus 6x turbo with all cores enabled).

Quote:
It is also interesting to see bmurray7JHU's 6950X results, as P95 v28.9 actually recognises his Broadwell-E.
I guess the 6950X is slightly different from the Broadwell-EP processors... I don't know how Prime95 chooses to display the CPU model info. Other programs like CPU-Z or even the old CoreInfo I have from 2014 can/will use the string reported by the cpu id ops but I wonder if Prime95 is also using the cpu family/model in a table? I guess I could look at the source, but all I know is, Prime95 28.9 starts up and says:
Code:
[Main thread Jun 27 22:03] Mersenne number primality test program version 28.9
[Main thread Jun 27 22:03] Optimizing for CPU architecture: Unknown Intel, L2 cache size: 256 KB, L3 cache size: 35 MB
[Main thread Jun 27 22:03] Using AffinityScramble2 setting to set affinity mask.
[Main thread Jun 27 22:03] Starting workers.
Madpoo is offline   Reply With Quote
Old 2016-06-28, 01:01   #680
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

19×397 Posts
Default

Quote:
Originally Posted by Madpoo View Post
but I wonder if Prime95 is also using the cpu family/model in a table?
Yes.

However, almost all of prime95's decisions about which FFT implementation is appropriate are based on other CPUID flags (like FMA support, prefetch support, etc). Thus, my fixing the family/model table will make no difference.

Last fiddled with by Prime95 on 2016-06-28 at 01:06
Prime95 is offline   Reply With Quote
Old 2016-06-28, 02:47   #681
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

1100111100012 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Yes.

However, almost all of prime95's decisions about which FFT implementation is appropriate are based on other CPUID flags (like FMA support, prefetch support, etc). Thus, my fixing the family/model table will make no difference.
That's kind of what I guessed. In my limited testing, it seemed like it picked the same FFT sizes that a "known" CPU would, but I confess I wasn't testing with exponents near the FFT boundaries.

Well, specifically for George, just holler if there are any tests or info you'd like me to run which might help out. I don't know enough about the operation of the program to even make a guess on whether there's something there that could make it slower at the larger FFTs. On the hardware side I'm not aware of anything either; on the contrary, everything suggests it should still run faster just like it does at the smaller FFTs.

I suppose it could be something else server centric... not the CPU or memory. Although the same server, motherboard, firmware, etc. is being used on the E5-2697 v3 and the E5-2690 v4... the only differences are the CPU and memory. Heck, they even have the same array controller and number/size of hard drives, same # of fans, power supplies, etc.

I had to reinstall the 2nd DIMM per channel in the new box today, in prep for shipping to it's new home, so I can't test anything related to the 2400 MHz mem speed, but otherwise, just holler if there's something you'd like me to test or whatever.
Madpoo is offline   Reply With Quote
Old 2016-07-08, 01:15   #682
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

5·17·97 Posts
Default

Code:
AMD Athlon(tm) X4 880K Quad Core Processor     
CPU speed: 3992.52 MHz, 4 cores
CPU features: 3DNow! Prefetch, SSE, SSE2, SSE4, AVX, FMA
L1 cache size: 16 KB
L2 cache size: 2 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
L1 TLBS: 64
L2 TLBS: 1024
Prime95 64-bit version 28.9, RdtscTiming=1
Best time for 1024K FFT length: 10.010 ms., avg: 11.270 ms.
Best time for 1280K FFT length: 12.614 ms., avg: 12.906 ms.
Best time for 1536K FFT length: 15.556 ms., avg: 15.815 ms.
Best time for 1792K FFT length: 18.877 ms., avg: 19.044 ms.
Best time for 2048K FFT length: 20.456 ms., avg: 20.495 ms.
Best time for 2560K FFT length: 26.180 ms., avg: 26.282 ms.
Best time for 3072K FFT length: 32.313 ms., avg: 32.739 ms.
Best time for 3584K FFT length: 39.220 ms., avg: 39.497 ms.
Best time for 4096K FFT length: 43.073 ms., avg: 44.114 ms.
Best time for 5120K FFT length: 56.635 ms., avg: 57.516 ms.
Best time for 6144K FFT length: 68.302 ms., avg: 69.329 ms.
Best time for 7168K FFT length: 82.788 ms., avg: 83.573 ms.
Best time for 8192K FFT length: 89.876 ms., avg: 91.534 ms.
Timing FFTs using 2 threads.
Best time for 1024K FFT length: 7.857 ms., avg: 7.994 ms.
Best time for 1280K FFT length: 9.959 ms., avg: 10.085 ms.
Best time for 1536K FFT length: 12.108 ms., avg: 12.372 ms.
Best time for 1792K FFT length: 14.877 ms., avg: 15.078 ms.
Best time for 2048K FFT length: 16.218 ms., avg: 16.828 ms.
Best time for 2560K FFT length: 20.900 ms., avg: 20.945 ms.
Best time for 3072K FFT length: 25.536 ms., avg: 25.943 ms.
Best time for 3584K FFT length: 30.956 ms., avg: 31.765 ms.
Best time for 4096K FFT length: 33.613 ms., avg: 34.454 ms.
Best time for 5120K FFT length: 43.658 ms., avg: 45.174 ms.
Best time for 6144K FFT length: 54.065 ms., avg: 55.816 ms.
Best time for 7168K FFT length: 68.734 ms., avg: 69.902 ms.
Best time for 8192K FFT length: 70.916 ms., avg: 72.768 ms.
Timing FFTs using 3 threads.
Best time for 1024K FFT length: 5.203 ms., avg: 5.248 ms.
Best time for 1280K FFT length: 6.508 ms., avg: 6.891 ms.
Best time for 1536K FFT length: 7.884 ms., avg: 8.572 ms.
Best time for 1792K FFT length: 9.488 ms., avg: 9.552 ms.
Best time for 2048K FFT length: 10.400 ms., avg: 10.825 ms.
Best time for 2560K FFT length: 13.212 ms., avg: 13.310 ms.
Best time for 3072K FFT length: 16.194 ms., avg: 16.283 ms.
Best time for 3584K FFT length: 19.408 ms., avg: 19.581 ms.
Best time for 4096K FFT length: 21.471 ms., avg: 21.603 ms.
Best time for 5120K FFT length: 27.852 ms., avg: 28.741 ms.
Best time for 6144K FFT length: 34.192 ms., avg: 35.219 ms.
Best time for 7168K FFT length: 42.224 ms., avg: 43.648 ms.
Best time for 8192K FFT length: 44.121 ms., avg: 44.815 ms.
Timing FFTs using 4 threads.
Best time for 1024K FFT length: 4.659 ms., avg: 4.852 ms.
Best time for 1280K FFT length: 5.828 ms., avg: 6.408 ms.
Best time for 1536K FFT length: 7.082 ms., avg: 7.163 ms.
Best time for 1792K FFT length: 8.429 ms., avg: 8.568 ms.
Best time for 2048K FFT length: 9.309 ms., avg: 9.432 ms.
Best time for 2560K FFT length: 11.875 ms., avg: 12.728 ms.
Best time for 3072K FFT length: 14.419 ms., avg: 14.583 ms.
Best time for 3584K FFT length: 17.422 ms., avg: 17.557 ms.
Best time for 4096K FFT length: 19.073 ms., avg: 19.235 ms.
Best time for 5120K FFT length: 25.090 ms., avg: 26.228 ms.
Best time for 6144K FFT length: 31.130 ms., avg: 31.377 ms.
Best time for 7168K FFT length: 39.248 ms., avg: 40.266 ms.
Best time for 8192K FFT length: 40.231 ms., avg: 41.124 ms.

Timings for 1024K FFT length (4 cpus, 4 workers): 17.15, 16.66, 16.73, 16.63 ms.  Throughput: 238.28 iter/sec.
Timings for 1280K FFT length (4 cpus, 4 workers): 30.26, 25.75, 27.13, 25.69 ms.  Throughput: 147.66 iter/sec.
Timings for 1536K FFT length (4 cpus, 4 workers): 46.37, 38.02, 34.43, 26.32 ms.  Throughput: 114.91 iter/sec.
Timings for 1792K FFT length (4 cpus, 4 workers): 36.96, 32.14, 32.23, 31.86 ms.  Throughput: 120.59 iter/sec.
Timings for 2048K FFT length (4 cpus, 4 workers): 75.09, 42.93, 47.02, 45.64 ms.  Throughput: 79.79 iter/sec.
Timings for 2560K FFT length (4 cpus, 4 workers): 179.39, 33.97, 44.16, 43.37 ms.  Throughput: 80.71 iter/sec.
Timings for 3072K FFT length (4 cpus, 4 workers): 175.11, 181.60, 175.32, 173.85 ms.  Throughput: 22.67 iter/sec.
Timings for 3584K FFT length (4 cpus, 4 workers): 217.28, 181.99, 97.64, 96.79 ms.  Throughput: 30.67 iter/sec.
Timings for 4096K FFT length (4 cpus, 4 workers): 209.09, 139.08, 101.27, 77.22 ms.  Throughput: 34.80 iter/sec.
Timings for 5120K FFT length (4 cpus, 4 workers): 403.35, 149.38, 122.86, 162.36 ms.  Throughput: 23.47 iter/sec.
Timings for 6144K FFT length (4 cpus, 4 workers): 298.96, 220.48, 143.27, 181.14 ms.  Throughput: 20.38 iter/sec.
Timings for 7168K FFT length (4 cpus, 4 workers): 220.75, 187.36, 173.04, 155.11 ms.  Throughput: 22.09 iter/sec.
Timings for 8192K FFT length (4 cpus, 4 workers): 285.44, 156.77, 210.51, 173.70 ms.  Throughput: 20.39 iter/sec.
Memory = DDR 1333 (No XMP)
Attached Thumbnails
Click image for larger version

Name:	CPU.png
Views:	113
Size:	27.2 KB
ID:	14610   Click image for larger version

Name:	Memory.png
Views:	98
Size:	15.5 KB
ID:	14611  
Xyzzy is offline   Reply With Quote
Old 2016-07-08, 15:28   #683
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

100000001101012 Posts
Default

Code:
AMD Athlon(tm) X4 880K Quad Core Processor     
CPU speed: 3992.60 MHz, 4 cores
CPU features: 3DNow! Prefetch, SSE, SSE2, SSE4, AVX, FMA
L1 cache size: 16 KB
L2 cache size: 2 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
L1 TLBS: 64
L2 TLBS: 1024
Prime95 64-bit version 28.9, RdtscTiming=1
Best time for 1024K FFT length: 9.441 ms., avg: 10.784 ms.
Best time for 1280K FFT length: 11.908 ms., avg: 12.144 ms.
Best time for 1536K FFT length: 14.638 ms., avg: 14.966 ms.
Best time for 1792K FFT length: 17.677 ms., avg: 17.876 ms.
Best time for 2048K FFT length: 19.207 ms., avg: 19.423 ms.
Best time for 2560K FFT length: 24.666 ms., avg: 25.159 ms.
Best time for 3072K FFT length: 30.524 ms., avg: 30.749 ms.
Best time for 3584K FFT length: 36.833 ms., avg: 37.386 ms.
Best time for 4096K FFT length: 40.447 ms., avg: 41.105 ms.
Best time for 5120K FFT length: 53.737 ms., avg: 54.486 ms.
Best time for 6144K FFT length: 63.972 ms., avg: 65.023 ms.
Best time for 7168K FFT length: 77.364 ms., avg: 78.493 ms.
Best time for 8192K FFT length: 84.527 ms., avg: 89.038 ms.
Timing FFTs using 2 threads.
Best time for 1024K FFT length: 7.499 ms., avg: 7.625 ms.
Best time for 1280K FFT length: 9.555 ms., avg: 9.729 ms.
Best time for 1536K FFT length: 12.650 ms., avg: 22.931 ms.
Best time for 1792K FFT length: 14.147 ms., avg: 14.607 ms.
Best time for 2048K FFT length: 16.007 ms., avg: 16.346 ms.
Best time for 2560K FFT length: 20.532 ms., avg: 20.780 ms.
Best time for 3072K FFT length: 24.693 ms., avg: 25.131 ms.
Best time for 3584K FFT length: 30.291 ms., avg: 30.659 ms.
Best time for 4096K FFT length: 32.720 ms., avg: 33.267 ms.
Best time for 5120K FFT length: 41.912 ms., avg: 42.955 ms.
Best time for 6144K FFT length: 53.026 ms., avg: 53.672 ms.
Best time for 7168K FFT length: 65.125 ms., avg: 65.908 ms.
Best time for 8192K FFT length: 68.750 ms., avg: 69.804 ms.
Timing FFTs using 3 threads.
Best time for 1024K FFT length: 4.818 ms., avg: 5.150 ms.
Best time for 1280K FFT length: 6.056 ms., avg: 6.249 ms.
Best time for 1536K FFT length: 7.297 ms., avg: 7.455 ms.
Best time for 1792K FFT length: 8.795 ms., avg: 8.901 ms.
Best time for 2048K FFT length: 9.550 ms., avg: 9.684 ms.
Best time for 2560K FFT length: 12.174 ms., avg: 12.753 ms.
Best time for 3072K FFT length: 14.993 ms., avg: 15.543 ms.
Best time for 3584K FFT length: 18.155 ms., avg: 18.328 ms.
Best time for 4096K FFT length: 20.102 ms., avg: 20.589 ms.
Best time for 5120K FFT length: 26.095 ms., avg: 26.448 ms.
Best time for 6144K FFT length: 31.990 ms., avg: 32.669 ms.
Best time for 7168K FFT length: 39.508 ms., avg: 40.116 ms.
Best time for 8192K FFT length: 41.524 ms., avg: 42.214 ms.
Timing FFTs using 4 threads.
Best time for 1024K FFT length: 4.291 ms., avg: 4.371 ms.
Best time for 1280K FFT length: 5.397 ms., avg: 6.000 ms.
Best time for 1536K FFT length: 6.452 ms., avg: 7.216 ms.
Best time for 1792K FFT length: 7.789 ms., avg: 8.071 ms.
Best time for 2048K FFT length: 8.608 ms., avg: 9.367 ms.
Best time for 2560K FFT length: 10.918 ms., avg: 10.998 ms.
Best time for 3072K FFT length: 13.307 ms., avg: 13.401 ms.
Best time for 3584K FFT length: 16.634 ms., avg: 17.332 ms.
Best time for 4096K FFT length: 18.184 ms., avg: 19.236 ms.
Best time for 5120K FFT length: 23.220 ms., avg: 24.465 ms.
Best time for 6144K FFT length: 28.705 ms., avg: 29.613 ms.
Best time for 7168K FFT length: 35.885 ms., avg: 36.748 ms.
Best time for 8192K FFT length: 38.086 ms., avg: 38.898 ms.

Timings for 1024K FFT length (4 cpus, 4 workers): 15.36, 15.23, 15.07, 15.43 ms.  Throughput: 261.92 iter/sec.
Timings for 1280K FFT length (4 cpus, 4 workers): 24.47, 24.32, 24.13, 24.73 ms.  Throughput: 163.85 iter/sec.
Timings for 1536K FFT length (4 cpus, 4 workers): 23.91, 23.74, 23.59, 24.18 ms.  Throughput: 167.70 iter/sec.
Timings for 1792K FFT length (4 cpus, 4 workers): 29.62, 29.24, 29.66, 30.10 ms.  Throughput: 134.91 iter/sec.
Timings for 2048K FFT length (4 cpus, 4 workers): 35.08, 35.02, 34.80, 35.58 ms.  Throughput: 113.90 iter/sec.
Timings for 2560K FFT length (4 cpus, 4 workers): 40.62, 40.41, 40.57, 41.66 ms.  Throughput: 98.02 iter/sec.
Timings for 3072K FFT length (4 cpus, 4 workers): 49.67, 49.48, 49.44, 50.66 ms.  Throughput: 80.31 iter/sec.
Timings for 3584K FFT length (4 cpus, 4 workers): 60.40, 59.99, 59.57, 61.18 ms.  Throughput: 66.36 iter/sec.
Timings for 4096K FFT length (4 cpus, 4 workers): 65.82, 65.35, 66.85, 68.03 ms.  Throughput: 60.15 iter/sec.
Timings for 5120K FFT length (4 cpus, 4 workers): 122.09, 127.83, 105.49, 108.19 ms.  Throughput: 34.74 iter/sec.
Timings for 6144K FFT length (4 cpus, 4 workers): 116.66, 115.89, 117.18, 119.84 ms.  Throughput: 34.08 iter/sec.
Timings for 7168K FFT length (4 cpus, 4 workers): 146.99, 133.00, 134.10, 137.95 ms.  Throughput: 29.03 iter/sec.
Timings for 8192K FFT length (4 cpus, 4 workers): 142.22, 141.87, 147.12, 151.41 ms.  Throughput: 27.48 iter/sec.
Memory = DDR 1600 (XMP)
Attached Thumbnails
Click image for larger version

Name:	CPU.png
Views:	104
Size:	27.2 KB
ID:	14613   Click image for larger version

Name:	Memory.png
Views:	98
Size:	15.5 KB
ID:	14614  
Xyzzy is offline   Reply With Quote
Old 2016-07-15, 09:03   #684
Antonio
 
Antonio's Avatar
 
"Antonio Key"
Sep 2011
UK

32·59 Posts
Default

MSI GP62 laptop, as supplied:
Code:
Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
CPU speed: 2871.05 MHz, 4 hyperthreaded cores
CPU features: Prefetchw, SSE, SSE2, SSE4, AVX, AVX2, FMA
L1 cache size: 32 KB
L2 cache size: 256 KB, L3 cache size: 6 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Prime95 64-bit version 28.9, RdtscTiming=1
Best time for 1024K FFT length: 4.296 ms., avg: 4.336 ms.
Best time for 1280K FFT length: 5.481 ms., avg: 5.540 ms.
Best time for 1536K FFT length: 6.635 ms., avg: 6.694 ms.
Best time for 1792K FFT length: 8.335 ms., avg: 8.458 ms.
Best time for 2048K FFT length: 8.855 ms., avg: 8.967 ms.
Best time for 2560K FFT length: 11.739 ms., avg: 11.834 ms.
Best time for 3072K FFT length: 13.918 ms., avg: 14.056 ms.
Best time for 3584K FFT length: 16.753 ms., avg: 16.865 ms.
Best time for 4096K FFT length: 19.382 ms., avg: 19.544 ms.
Best time for 5120K FFT length: 24.603 ms., avg: 24.824 ms.
Best time for 6144K FFT length: 27.600 ms., avg: 28.008 ms.
Best time for 7168K FFT length: 32.969 ms., avg: 33.497 ms.
Best time for 8192K FFT length: 37.353 ms., avg: 37.655 ms.
Timing FFTs using 2 threads on 1 physical CPU.
Best time for 1024K FFT length: 3.808 ms., avg: 3.880 ms.
Best time for 1280K FFT length: 4.966 ms., avg: 5.061 ms.
Best time for 1536K FFT length: 5.993 ms., avg: 6.111 ms.
Best time for 1792K FFT length: 7.315 ms., avg: 7.440 ms.
Best time for 2048K FFT length: 8.228 ms., avg: 8.391 ms.
Best time for 2560K FFT length: 10.482 ms., avg: 10.631 ms.
Best time for 3072K FFT length: 12.651 ms., avg: 12.874 ms.
Best time for 3584K FFT length: 15.118 ms., avg: 15.404 ms.
Best time for 4096K FFT length: 17.592 ms., avg: 17.923 ms.
Best time for 5120K FFT length: 22.921 ms., avg: 23.173 ms.
Best time for 6144K FFT length: 27.239 ms., avg: 27.720 ms.
Best time for 7168K FFT length: 32.441 ms., avg: 33.009 ms.
Best time for 8192K FFT length: 38.048 ms., avg: 38.388 ms.
Timing FFTs using 2 threads on 2 physical CPUs.
Best time for 1024K FFT length: 2.362 ms., avg: 2.405 ms.
Best time for 1280K FFT length: 3.054 ms., avg: 3.189 ms.
Best time for 1536K FFT length: 3.792 ms., avg: 3.883 ms.
Best time for 1792K FFT length: 4.741 ms., avg: 4.890 ms.
Best time for 2048K FFT length: 5.345 ms., avg: 5.441 ms.
Best time for 2560K FFT length: 6.783 ms., avg: 6.906 ms.
Best time for 3072K FFT length: 8.151 ms., avg: 8.300 ms.
Best time for 3584K FFT length: 9.880 ms., avg: 10.072 ms.
Best time for 4096K FFT length: 11.451 ms., avg: 11.579 ms.
Best time for 5120K FFT length: 14.355 ms., avg: 14.571 ms.
Best time for 6144K FFT length: 17.662 ms., avg: 17.914 ms.
Best time for 7168K FFT length: 20.864 ms., avg: 21.212 ms.
Best time for 8192K FFT length: 23.734 ms., avg: 24.057 ms.
Timing FFTs using 3 threads on 3 physical CPUs.
Best time for 1024K FFT length: 1.951 ms., avg: 2.031 ms.
Best time for 1280K FFT length: 2.686 ms., avg: 2.753 ms.
Best time for 1536K FFT length: 3.411 ms., avg: 3.547 ms.
Best time for 1792K FFT length: 4.307 ms., avg: 4.381 ms.
Best time for 2048K FFT length: 4.867 ms., avg: 4.966 ms.
Best time for 2560K FFT length: 6.392 ms., avg: 6.484 ms.
Best time for 3072K FFT length: 7.881 ms., avg: 7.952 ms.
Best time for 3584K FFT length: 9.333 ms., avg: 9.501 ms.
Best time for 4096K FFT length: 10.777 ms., avg: 10.868 ms.
Best time for 5120K FFT length: 13.557 ms., avg: 13.726 ms.
Best time for 6144K FFT length: 16.558 ms., avg: 16.693 ms.
Best time for 7168K FFT length: 19.388 ms., avg: 19.540 ms.
Best time for 8192K FFT length: 22.298 ms., avg: 22.467 ms.
Timing FFTs using 4 threads on 4 physical CPUs.
Best time for 1024K FFT length: 1.897 ms., avg: 1.975 ms.
Best time for 1280K FFT length: 2.674 ms., avg: 2.805 ms.
Best time for 1536K FFT length: 3.437 ms., avg: 3.730 ms.
Best time for 1792K FFT length: 4.300 ms., avg: 4.418 ms.
Best time for 2048K FFT length: 4.936 ms., avg: 5.046 ms.
Best time for 2560K FFT length: 6.491 ms., avg: 6.630 ms.
Best time for 3072K FFT length: 7.857 ms., avg: 8.010 ms.
Best time for 3584K FFT length: 9.462 ms., avg: 9.589 ms.
Best time for 4096K FFT length: 10.774 ms., avg: 10.910 ms.
Best time for 5120K FFT length: 13.670 ms., avg: 13.807 ms.
Best time for 6144K FFT length: 16.691 ms., avg: 16.902 ms.
Best time for 7168K FFT length: 19.578 ms., avg: 19.806 ms.
Best time for 8192K FFT length: 22.557 ms., avg: 22.794 ms.
Timing FFTs using 8 threads on 4 physical CPUs.
Best time for 1024K FFT length: 2.001 ms., avg: 2.117 ms.
Best time for 1280K FFT length: 2.932 ms., avg: 3.025 ms.
Best time for 1536K FFT length: 3.772 ms., avg: 4.140 ms.
Best time for 1792K FFT length: 4.527 ms., avg: 4.628 ms.
Best time for 2048K FFT length: 5.495 ms., avg: 5.611 ms.
Best time for 2560K FFT length: 6.950 ms., avg: 7.051 ms.
Best time for 3072K FFT length: 8.477 ms., avg: 8.608 ms.
Best time for 3584K FFT length: 10.001 ms., avg: 10.178 ms.
Best time for 4096K FFT length: 11.566 ms., avg: 11.700 ms.
Best time for 5120K FFT length: 14.591 ms., avg: 14.718 ms.
Best time for 6144K FFT length: 19.094 ms., avg: 19.274 ms.
Best time for 7168K FFT length: 22.620 ms., avg: 22.817 ms.
Best time for 8192K FFT length: 26.482 ms., avg: 26.792 ms.
Timings for 1024K FFT length (4 cpus, 4 workers): 10.56, 10.46, 10.47, 10.52 ms.  Throughput: 380.89 iter/sec.
Timings for 1024K FFT length (4 cpus hyperthreaded, 4 workers): 11.78, 10.89, 11.52, 11.12 ms.  Throughput: 353.38 iter/sec.
Timings for 1280K FFT length (4 cpus, 4 workers): 13.07, 12.96, 13.01, 13.08 ms.  Throughput: 306.93 iter/sec.
Timings for 1280K FFT length (4 cpus hyperthreaded, 4 workers): 16.58, 14.81, 14.08, 13.68 ms.  Throughput: 271.99 iter/sec.
Timings for 1536K FFT length (4 cpus, 4 workers): 16.03, 15.61, 15.57, 15.68 ms.  Throughput: 254.46 iter/sec.
Timings for 1536K FFT length (4 cpus hyperthreaded, 4 workers): 17.75, 16.63, 17.37, 17.12 ms.  Throughput: 232.48 iter/sec.
[Fri Jul 15 07:33:52 2016]
Timings for 1792K FFT length (4 cpus, 4 workers): 19.24, 18.95, 19.24, 19.03 ms.  Throughput: 209.29 iter/sec.
Timings for 1792K FFT length (4 cpus hyperthreaded, 4 workers): 20.69, 20.12, 20.39, 19.94 ms.  Throughput: 197.22 iter/sec.
Timings for 2048K FFT length (4 cpus, 4 workers): 21.75, 21.74, 21.37, 22.04 ms.  Throughput: 184.15 iter/sec.
Timings for 2048K FFT length (4 cpus hyperthreaded, 4 workers): 24.47, 23.14, 23.41, 23.05 ms.  Throughput: 170.20 iter/sec.
Timings for 2560K FFT length (4 cpus, 4 workers): 27.64, 27.75, 28.01, 27.75 ms.  Throughput: 143.95 iter/sec.
Timings for 2560K FFT length (4 cpus hyperthreaded, 4 workers): 30.27, 28.52, 29.76, 28.98 ms.  Throughput: 136.22 iter/sec.
Timings for 3072K FFT length (4 cpus, 4 workers): 33.39, 33.72, 33.75, 33.93 ms.  Throughput: 118.71 iter/sec.
Timings for 3072K FFT length (4 cpus hyperthreaded, 4 workers): 35.12, 34.94, 35.22, 38.68 ms.  Throughput: 111.34 iter/sec.
Timings for 3584K FFT length (4 cpus, 4 workers): 39.08, 39.42, 39.21, 39.07 ms.  Throughput: 102.06 iter/sec.
Timings for 3584K FFT length (4 cpus hyperthreaded, 4 workers): 42.18, 40.12, 41.65, 40.79 ms.  Throughput: 97.17 iter/sec.
Timings for 4096K FFT length (4 cpus, 4 workers): 45.15, 44.39, 44.75, 44.55 ms.  Throughput: 89.46 iter/sec.
Timings for 4096K FFT length (4 cpus hyperthreaded, 4 workers): 48.58, 46.52, 47.66, 46.46 ms.  Throughput: 84.59 iter/sec.
Timings for 5120K FFT length (4 cpus, 4 workers): 56.36, 55.86, 56.21, 55.81 ms.  Throughput: 71.35 iter/sec.
Timings for 5120K FFT length (4 cpus hyperthreaded, 4 workers): 61.82, 57.63, 59.08, 58.33 ms.  Throughput: 67.60 iter/sec.
Timings for 6144K FFT length (4 cpus, 4 workers): 68.90, 67.47, 66.88, 67.70 ms.  Throughput: 59.06 iter/sec.
Timings for 6144K FFT length (4 cpus hyperthreaded, 4 workers): 81.01, 79.28, 78.07, 77.12 ms.  Throughput: 50.73 iter/sec.
Timings for 7168K FFT length (4 cpus, 4 workers): 81.19, 79.26, 80.07, 79.56 ms.  Throughput: 49.99 iter/sec.
Timings for 7168K FFT length (4 cpus hyperthreaded, 4 workers): 98.60, 88.75, 105.43, 91.10 ms.  Throughput: 41.87 iter/sec.
Timings for 8192K FFT length (4 cpus, 4 workers): 93.06, 91.45, 90.43, 90.39 ms.  Throughput: 43.80 iter/sec.
Timings for 8192K FFT length (4 cpus hyperthreaded, 4 workers): 112.22, 106.81, 109.83, 108.19 ms.  Throughput: 36.62 iter/sec.
memory 1*8GB DDR4 2133MHz CL= 15

Last fiddled with by Antonio on 2016-07-15 at 09:08
Antonio is offline   Reply With Quote
Old 2016-07-15, 09:06   #685
Antonio
 
Antonio's Avatar
 
"Antonio Key"
Sep 2011
UK

32×59 Posts
Default

MSI GP62 laptop, after upgrade
Code:
[Fri Jul 15 08:49:20 2016]
Compare your results to other computers at http://www.mersenne.org/report_benchmarks
Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
CPU speed: 2965.72 MHz, 4 hyperthreaded cores
CPU features: Prefetchw, SSE, SSE2, SSE4, AVX, AVX2, FMA
L1 cache size: 32 KB
L2 cache size: 256 KB, L3 cache size: 6 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Prime95 64-bit version 28.9, RdtscTiming=1
Best time for 1024K FFT length: 4.274 ms., avg: 4.438 ms.
Best time for 1280K FFT length: 5.459 ms., avg: 5.515 ms.
Best time for 1536K FFT length: 6.589 ms., avg: 6.723 ms.
Best time for 1792K FFT length: 8.251 ms., avg: 9.498 ms.
Best time for 2048K FFT length: 8.773 ms., avg: 8.905 ms.
Best time for 2560K FFT length: 11.494 ms., avg: 11.603 ms.
Best time for 3072K FFT length: 13.771 ms., avg: 13.922 ms.
Best time for 3584K FFT length: 16.668 ms., avg: 19.400 ms.
Best time for 4096K FFT length: 19.024 ms., avg: 21.261 ms.
Best time for 5120K FFT length: 24.186 ms., avg: 32.477 ms.
Best time for 6144K FFT length: 26.921 ms., avg: 29.837 ms.
Best time for 7168K FFT length: 32.441 ms., avg: 33.160 ms.
Best time for 8192K FFT length: 37.176 ms., avg: 38.156 ms.
Timing FFTs using 2 threads on 1 physical CPU.
Best time for 1024K FFT length: 3.776 ms., avg: 3.837 ms.
Best time for 1280K FFT length: 4.903 ms., avg: 4.995 ms.
Best time for 1536K FFT length: 5.941 ms., avg: 6.127 ms.
Best time for 1792K FFT length: 7.319 ms., avg: 7.416 ms.
Best time for 2048K FFT length: 8.142 ms., avg: 8.303 ms.
Best time for 2560K FFT length: 10.387 ms., avg: 10.617 ms.
Best time for 3072K FFT length: 12.619 ms., avg: 12.749 ms.
Best time for 3584K FFT length: 15.004 ms., avg: 15.274 ms.
Best time for 4096K FFT length: 17.472 ms., avg: 18.849 ms.
Best time for 5120K FFT length: 22.309 ms., avg: 22.905 ms.
Best time for 6144K FFT length: 26.976 ms., avg: 27.653 ms.
Best time for 7168K FFT length: 32.541 ms., avg: 33.157 ms.
Best time for 8192K FFT length: 37.688 ms., avg: 38.480 ms.
Timing FFTs using 2 threads on 2 physical CPUs.
Best time for 1024K FFT length: 2.211 ms., avg: 2.262 ms.
Best time for 1280K FFT length: 2.788 ms., avg: 2.820 ms.
Best time for 1536K FFT length: 3.372 ms., avg: 3.444 ms.
Best time for 1792K FFT length: 4.237 ms., avg: 4.317 ms.
Best time for 2048K FFT length: 4.505 ms., avg: 4.568 ms.
Best time for 2560K FFT length: 5.888 ms., avg: 5.999 ms.
Best time for 3072K FFT length: 7.014 ms., avg: 7.116 ms.
Best time for 3584K FFT length: 8.478 ms., avg: 8.666 ms.
Best time for 4096K FFT length: 9.681 ms., avg: 9.780 ms.
Best time for 5120K FFT length: 12.460 ms., avg: 12.562 ms.
Best time for 6144K FFT length: 13.922 ms., avg: 14.115 ms.
Best time for 7168K FFT length: 16.724 ms., avg: 16.875 ms.
Best time for 8192K FFT length: 19.171 ms., avg: 19.321 ms.
Timing FFTs using 3 threads on 3 physical CPUs.
Best time for 1024K FFT length: 1.510 ms., avg: 1.547 ms.
Best time for 1280K FFT length: 1.904 ms., avg: 2.005 ms.
Best time for 1536K FFT length: 2.285 ms., avg: 2.347 ms.
Best time for 1792K FFT length: 2.869 ms., avg: 2.911 ms.
Best time for 2048K FFT length: 3.150 ms., avg: 3.250 ms.
Best time for 2560K FFT length: 4.002 ms., avg: 4.057 ms.
Best time for 3072K FFT length: 4.764 ms., avg: 4.842 ms.
Best time for 3584K FFT length: 5.707 ms., avg: 5.808 ms.
Best time for 4096K FFT length: 6.606 ms., avg: 6.764 ms.
Best time for 5120K FFT length: 8.471 ms., avg: 8.612 ms.
Best time for 6144K FFT length: 9.700 ms., avg: 9.822 ms.
Best time for 7168K FFT length: 11.584 ms., avg: 11.772 ms.
Best time for 8192K FFT length: 13.342 ms., avg: 13.598 ms.
Timing FFTs using 4 threads on 4 physical CPUs.
Best time for 1024K FFT length: 1.172 ms., avg: 1.264 ms.
Best time for 1280K FFT length: 1.499 ms., avg: 1.568 ms.
Best time for 1536K FFT length: 1.829 ms., avg: 1.949 ms.
Best time for 1792K FFT length: 2.243 ms., avg: 2.344 ms.
Best time for 2048K FFT length: 2.567 ms., avg: 2.698 ms.
Best time for 2560K FFT length: 3.169 ms., avg: 3.235 ms.
Best time for 3072K FFT length: 3.800 ms., avg: 3.926 ms.
Best time for 3584K FFT length: 4.545 ms., avg: 4.642 ms.
Best time for 4096K FFT length: 5.245 ms., avg: 5.416 ms.
Best time for 5120K FFT length: 6.741 ms., avg: 6.901 ms.
Best time for 6144K FFT length: 8.261 ms., avg: 8.528 ms.
Best time for 7168K FFT length: 9.749 ms., avg: 10.051 ms.
Best time for 8192K FFT length: 11.388 ms., avg: 11.655 ms.
Timing FFTs using 8 threads on 4 physical CPUs.
Best time for 1024K FFT length: 1.210 ms., avg: 1.416 ms.
Best time for 1280K FFT length: 1.483 ms., avg: 1.551 ms.
Best time for 1536K FFT length: 1.844 ms., avg: 1.947 ms.
Best time for 1792K FFT length: 2.175 ms., avg: 2.229 ms.
Best time for 2048K FFT length: 2.602 ms., avg: 2.706 ms.
Best time for 2560K FFT length: 3.201 ms., avg: 3.321 ms.
Best time for 3072K FFT length: 3.891 ms., avg: 3.982 ms.
Best time for 3584K FFT length: 4.634 ms., avg: 4.764 ms.
Best time for 4096K FFT length: 5.333 ms., avg: 5.432 ms.
Best time for 5120K FFT length: 6.840 ms., avg: 6.976 ms.
Best time for 6144K FFT length: 8.915 ms., avg: 9.150 ms.
Best time for 7168K FFT length: 10.590 ms., avg: 10.740 ms.
Best time for 8192K FFT length: 12.539 ms., avg: 12.733 ms.
Timings for 1024K FFT length (4 cpus, 4 workers):  4.96,  4.95,  4.97,  4.95 ms.  Throughput: 806.81 iter/sec.
Timings for 1024K FFT length (4 cpus hyperthreaded, 4 workers):  5.28,  5.22,  5.13,  5.14 ms.  Throughput: 770.28 iter/sec.
Timings for 1280K FFT length (4 cpus, 4 workers):  6.21,  6.34,  6.20,  6.16 ms.  Throughput: 642.27 iter/sec.
Timings for 1280K FFT length (4 cpus hyperthreaded, 4 workers):  6.47,  6.42,  6.43,  6.45 ms.  Throughput: 621.07 iter/sec.
Timings for 1536K FFT length (4 cpus, 4 workers):  7.49,  7.49,  7.49,  7.49 ms.  Throughput: 533.95 iter/sec.
Timings for 1536K FFT length (4 cpus hyperthreaded, 4 workers):  7.93,  7.83,  7.84,  7.86 ms.  Throughput: 508.43 iter/sec.
[Fri Jul 15 08:54:28 2016]
Timings for 1792K FFT length (4 cpus, 4 workers):  9.55,  9.21,  9.26,  9.31 ms.  Throughput: 428.84 iter/sec.
Timings for 1792K FFT length (4 cpus hyperthreaded, 4 workers):  9.61,  9.41,  9.30,  9.44 ms.  Throughput: 423.91 iter/sec.
Timings for 2048K FFT length (4 cpus, 4 workers): 10.65, 10.47, 10.41, 10.43 ms.  Throughput: 381.42 iter/sec.
Timings for 2048K FFT length (4 cpus hyperthreaded, 4 workers): 10.90, 10.85, 10.92, 10.84 ms.  Throughput: 367.74 iter/sec.
Timings for 2560K FFT length (4 cpus, 4 workers): 12.96, 12.86, 12.91, 12.94 ms.  Throughput: 309.66 iter/sec.
Timings for 2560K FFT length (4 cpus hyperthreaded, 4 workers): 13.58, 13.45, 13.57, 13.37 ms.  Throughput: 296.46 iter/sec.
Timings for 3072K FFT length (4 cpus, 4 workers): 15.68, 15.77, 15.58, 15.60 ms.  Throughput: 255.48 iter/sec.
Timings for 3072K FFT length (4 cpus hyperthreaded, 4 workers): 16.33, 16.05, 16.33, 16.84 ms.  Throughput: 244.15 iter/sec.
Timings for 3584K FFT length (4 cpus, 4 workers): 18.66, 18.70, 18.82, 18.65 ms.  Throughput: 213.79 iter/sec.
Timings for 3584K FFT length (4 cpus hyperthreaded, 4 workers): 19.34, 19.13, 19.45, 19.06 ms.  Throughput: 207.84 iter/sec.
Timings for 4096K FFT length (4 cpus, 4 workers): 21.59, 21.90, 21.59, 21.56 ms.  Throughput: 184.69 iter/sec.
Timings for 4096K FFT length (4 cpus hyperthreaded, 4 workers): 22.33, 22.09, 22.61, 22.08 ms.  Throughput: 179.58 iter/sec.
Timings for 5120K FFT length (4 cpus, 4 workers): 27.43, 27.40, 27.36, 27.63 ms.  Throughput: 145.69 iter/sec.
Timings for 5120K FFT length (4 cpus hyperthreaded, 4 workers): 28.31, 27.85, 28.37, 27.91 ms.  Throughput: 142.30 iter/sec.
Timings for 6144K FFT length (4 cpus, 4 workers): 32.61, 32.96, 32.61, 32.87 ms.  Throughput: 122.09 iter/sec.
Timings for 6144K FFT length (4 cpus hyperthreaded, 4 workers): 36.28, 36.26, 36.58, 35.64 ms.  Throughput: 110.55 iter/sec.
Timings for 7168K FFT length (4 cpus, 4 workers): 39.57, 39.59, 38.51, 38.68 ms.  Throughput: 102.35 iter/sec.
Timings for 7168K FFT length (4 cpus hyperthreaded, 4 workers): 43.56, 43.24, 44.01, 42.34 ms.  Throughput: 92.43 iter/sec.
Timings for 8192K FFT length (4 cpus, 4 workers): 44.88, 44.96, 44.87, 44.86 ms.  Throughput: 89.10 iter/sec.
Timings for 8192K FFT length (4 cpus hyperthreaded, 4 workers): 51.17, 50.98, 51.37, 49.89 ms.  Throughput: 78.66 iter/sec.
memory 2*8GB DDR4 2400MHz CL= 16
Antonio is offline   Reply With Quote
Old 2016-07-15, 14:55   #686
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

1100111100012 Posts
Default

Quote:
Originally Posted by Antonio View Post
MSI GP62 laptop, after upgrade...
You'll probably get some interesting tests by comparing things like "1 thread on 4 physical CPUs" and also you may as well disable testing on hyperthreaded cores entirely (like 8 threads on 4 physical CPUs). No matter what the benchmarks show, it won't help and will only slow things down.

Going from single to dual/triple/quad channel is a pretty nice performance bump in real world testing, but some of the benchmarks you had, like "1 thread on one CPU" don't reflect that.
Madpoo is offline   Reply With Quote
Old 2016-07-15, 15:50   #687
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

7×292 Posts
Default

Quote:
Originally Posted by Madpoo View Post
You'll probably get some interesting tests by comparing things like "1 thread on 4 physical CPUs" and also you may as well disable testing on hyperthreaded cores entirely (like 8 threads on 4 physical CPUs). No matter what the benchmarks show, it won't help and will only slow things down.

Going from single to dual/triple/quad channel is a pretty nice performance bump in real world testing, but some of the benchmarks you had, like "1 thread on one CPU" don't reflect that.
Single channel basically made it a dual core cpu performance wise. Dual channel doubled the performance. More memory speed would still increase performance. For 8192K the single core ms/4 = 9.5 ms/iter while 4 core was 11.6 ms/iter.
henryzz is online now   Reply With Quote
Old 2016-07-19, 15:35   #688
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

3×5×313 Posts
Default My son's new PC Intel Core i7-6700 @ 3.40GHz ... multi-worker slowdown...

Intel Core i7-6700 @ 3.40GHz Windows64, Prime95, v28.9, build 2
12 GB of RAM.....trying to find out the specs of it and in how many sticks.

Looking at 2048K for example the increase from 1 core alone to 4 cores concurrent is more than double. This is way more than I expected or have seen in my experience.
Anyone else have similar experiences with this CHIP?

Code:
[Sun Jul 17 20:59:50 2016]
Compare your results to other computers at http://www.mersenne.org/report_benchmarks
Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
CPU speed: 3521.18 MHz, 4 hyperthreaded cores
CPU features: Prefetchw, SSE, SSE2, SSE4, AVX, AVX2, FMA
L1 cache size: 32 KB
L2 cache size: 256 KB, L3 cache size: 8 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Prime95 64-bit version 28.9, RdtscTiming=1
Best time for 1024K FFT length: 3.570 ms., avg: 3.604 ms.
Best time for 1280K FFT length: 4.571 ms., avg: 6.510 ms.
Best time for 1536K FFT length: 5.536 ms., avg: 5.745 ms.
Best time for 1792K FFT length: 6.943 ms., avg: 7.024 ms.
Best time for 2048K FFT length: 7.407 ms., avg: 7.481 ms.
Best time for 2560K FFT length: 9.861 ms., avg: 9.932 ms.
Best time for 3072K FFT length: 11.675 ms., avg: 11.778 ms.
Best time for 3584K FFT length: 14.029 ms., avg: 14.130 ms.
Best time for 4096K FFT length: 16.201 ms., avg: 16.353 ms.
Best time for 5120K FFT length: 20.684 ms., avg: 20.823 ms.
Best time for 6144K FFT length: 22.821 ms., avg: 23.019 ms.
Best time for 7168K FFT length: 27.455 ms., avg: 27.791 ms.
Best time for 8192K FFT length: 31.462 ms., avg: 31.623 ms.
Timing FFTs using 2 threads on 1 physical CPU.
Best time for 1024K FFT length: 3.157 ms., avg: 3.229 ms.
Best time for 1280K FFT length: 4.081 ms., avg: 4.166 ms.
Best time for 1536K FFT length: 4.946 ms., avg: 5.099 ms.
Best time for 1792K FFT length: 6.124 ms., avg: 6.207 ms.
Best time for 2048K FFT length: 6.944 ms., avg: 7.045 ms.
Best time for 2560K FFT length: 8.992 ms., avg: 9.135 ms.
Best time for 3072K FFT length: 10.893 ms., avg: 11.219 ms.
Best time for 3584K FFT length: 13.051 ms., avg: 13.466 ms.
Best time for 4096K FFT length: 15.307 ms., avg: 15.678 ms.
Best time for 5120K FFT length: 19.976 ms., avg: 21.172 ms.
Best time for 6144K FFT length: 22.624 ms., avg: 23.150 ms.
Best time for 7168K FFT length: 27.619 ms., avg: 28.916 ms.
Best time for 8192K FFT length: 31.943 ms., avg: 32.385 ms.
Timing FFTs using 2 threads on 2 physical CPUs.
Best time for 1024K FFT length: 1.854 ms., avg: 1.946 ms.
Best time for 1280K FFT length: 2.374 ms., avg: 2.447 ms.
Best time for 1536K FFT length: 2.879 ms., avg: 3.060 ms.
Best time for 1792K FFT length: 3.608 ms., avg: 3.682 ms.
Best time for 2048K FFT length: 4.082 ms., avg: 4.214 ms.
Best time for 2560K FFT length: 5.216 ms., avg: 5.356 ms.
Best time for 3072K FFT length: 6.213 ms., avg: 6.409 ms.
Best time for 3584K FFT length: 7.509 ms., avg: 8.193 ms.
Best time for 4096K FFT length: 8.632 ms., avg: 8.801 ms.
Best time for 5120K FFT length: 11.083 ms., avg: 11.447 ms.
Best time for 6144K FFT length: 13.013 ms., avg: 13.370 ms.
Best time for 7168K FFT length: 15.645 ms., avg: 15.952 ms.
Best time for 8192K FFT length: 18.152 ms., avg: 18.940 ms.
Timing FFTs using 3 threads on 3 physical CPUs.
Best time for 1024K FFT length: 1.330 ms., avg: 1.486 ms.
Best time for 1280K FFT length: 1.801 ms., avg: 1.948 ms.
Best time for 1536K FFT length: 2.259 ms., avg: 2.613 ms.
Best time for 1792K FFT length: 2.963 ms., avg: 3.252 ms.
Best time for 2048K FFT length: 3.412 ms., avg: 3.622 ms.
Best time for 2560K FFT length: 4.411 ms., avg: 4.678 ms.
Best time for 3072K FFT length: 5.450 ms., avg: 5.671 ms.
Best time for 3584K FFT length: 6.406 ms., avg: 6.697 ms.
Best time for 4096K FFT length: 7.410 ms., avg: 7.748 ms.
Best time for 5120K FFT length: 9.428 ms., avg: 9.621 ms.
Best time for 6144K FFT length: 11.788 ms., avg: 12.182 ms.
Best time for 7168K FFT length: 13.998 ms., avg: 14.265 ms.
Best time for 8192K FFT length: 16.184 ms., avg: 16.330 ms.
Timing FFTs using 4 threads on 4 physical CPUs.
Best time for 1024K FFT length: 1.200 ms., avg: 1.414 ms.
Best time for 1280K FFT length: 1.802 ms., avg: 2.018 ms.
Best time for 1536K FFT length: 2.443 ms., avg: 2.739 ms.
Best time for 1792K FFT length: 3.074 ms., avg: 3.254 ms.
Best time for 2048K FFT length: 3.567 ms., avg: 3.759 ms.
Best time for 2560K FFT length: 4.583 ms., avg: 4.787 ms.
Best time for 3072K FFT length: 5.348 ms., avg: 5.619 ms.
Best time for 3584K FFT length: 6.476 ms., avg: 6.609 ms.
Best time for 4096K FFT length: 7.402 ms., avg: 7.995 ms.
Best time for 5120K FFT length: 9.288 ms., avg: 9.437 ms.
Best time for 6144K FFT length: 11.611 ms., avg: 11.711 ms.
Best time for 7168K FFT length: 13.623 ms., avg: 13.737 ms.
Best time for 8192K FFT length: 15.620 ms., avg: 15.758 ms.
Timing FFTs using 8 threads on 4 physical CPUs.
Best time for 1024K FFT length: 1.075 ms., avg: 1.156 ms.
Best time for 1280K FFT length: 1.786 ms., avg: 1.975 ms.
Best time for 1536K FFT length: 2.380 ms., avg: 2.444 ms.
Best time for 1792K FFT length: 2.926 ms., avg: 3.007 ms.
Best time for 2048K FFT length: 3.691 ms., avg: 3.754 ms.
Best time for 2560K FFT length: 4.640 ms., avg: 4.871 ms.
Best time for 3072K FFT length: 5.691 ms., avg: 5.820 ms.
Best time for 3584K FFT length: 6.788 ms., avg: 6.940 ms.
Best time for 4096K FFT length: 7.782 ms., avg: 7.942 ms.
Best time for 5120K FFT length: 9.900 ms., avg: 10.199 ms.
Best time for 6144K FFT length: 12.794 ms., avg: 12.960 ms.
Best time for 7168K FFT length: 15.184 ms., avg: 15.337 ms.
Best time for 8192K FFT length: 17.507 ms., avg: 17.683 ms.

Timings for 1024K FFT length (4 cpus, 4 workers):  7.47,  7.47,  7.47,  7.45 ms.  Throughput: 535.75 iter/sec.
Timings for 1024K FFT length (4 cpus hyperthreaded, 4 workers):  7.86,  7.71,  7.98,  7.86 ms.  Throughput: 509.38 iter/sec.
Timings for 1280K FFT length (4 cpus, 4 workers):  9.67,  9.55,  9.67,  9.57 ms.  Throughput: 415.98 iter/sec.
Timings for 1280K FFT length (4 cpus hyperthreaded, 4 workers): 10.04,  9.76, 10.12,  9.90 ms.  Throughput: 402.04 iter/sec.
Timings for 1536K FFT length (4 cpus, 4 workers): 11.43, 11.35, 11.37, 11.41 ms.  Throughput: 351.21 iter/sec.
Timings for 1536K FFT length (4 cpus hyperthreaded, 4 workers): 15.88, 11.97, 14.12, 11.38 ms.  Throughput: 305.20 iter/sec.
Timings for 1792K FFT length (4 cpus, 4 workers): 14.92, 14.86, 14.48, 14.28 ms.  Throughput: 273.40 iter/sec.
Timings for 1792K FFT length (4 cpus hyperthreaded, 4 workers): 16.06, 13.78, 14.74, 13.77 ms.  Throughput: 275.24 iter/sec.
Timings for 2048K FFT length (4 cpus, 4 workers): 15.98, 16.16, 15.99, 15.70 ms.  Throughput: 250.71 iter/sec.
[Sun Jul 17 21:04:54 2016]
Timings for 2048K FFT length (4 cpus hyperthreaded, 4 workers): 17.96, 15.66, 17.05, 15.99 ms.  Throughput: 240.76 iter/sec.
Timings for 2560K FFT length (4 cpus, 4 workers): 20.11, 19.99, 20.01, 19.87 ms.  Throughput: 200.04 iter/sec.
Timings for 2560K FFT length (4 cpus hyperthreaded, 4 workers): 23.29, 20.56, 23.56, 20.85 ms.  Throughput: 182.00 iter/sec.
Timings for 3072K FFT length (4 cpus, 4 workers): 25.51, 24.58, 25.04, 24.95 ms.  Throughput: 159.91 iter/sec.
Timings for 3072K FFT length (4 cpus hyperthreaded, 4 workers): 30.89, 24.83, 28.11, 23.89 ms.  Throughput: 150.08 iter/sec.
Timings for 3584K FFT length (4 cpus, 4 workers): 27.43, 27.01, 27.49, 27.13 ms.  Throughput: 146.71 iter/sec.
Timings for 3584K FFT length (4 cpus hyperthreaded, 4 workers): 30.75, 27.89, 30.89, 28.32 ms.  Throughput: 136.06 iter/sec.
Timings for 4096K FFT length (4 cpus, 4 workers): 30.89, 30.92, 31.37, 30.78 ms.  Throughput: 129.08 iter/sec.
Timings for 4096K FFT length (4 cpus hyperthreaded, 4 workers): 34.37, 31.07, 33.70, 31.19 ms.  Throughput: 123.01 iter/sec.
Timings for 5120K FFT length (4 cpus, 4 workers): 40.89, 40.30, 40.49, 39.92 ms.  Throughput: 99.02 iter/sec.
Timings for 5120K FFT length (4 cpus hyperthreaded, 4 workers): 40.43, 38.60, 40.56, 39.41 ms.  Throughput: 100.67 iter/sec.
Timings for 6144K FFT length (4 cpus, 4 workers): 49.29, 49.65, 48.93, 48.15 ms.  Throughput: 81.63 iter/sec.
Timings for 6144K FFT length (4 cpus hyperthreaded, 4 workers): 55.59, 49.50, 54.45, 50.23 ms.  Throughput: 76.46 iter/sec.
Timings for 7168K FFT length (4 cpus, 4 workers): 59.85, 55.50, 55.58, 54.95 ms.  Throughput: 70.92 iter/sec.
Timings for 7168K FFT length (4 cpus hyperthreaded, 4 workers): 65.84, 59.25, 64.12, 59.48 ms.  Throughput: 64.47 iter/sec.
Timings for 8192K FFT length (4 cpus, 4 workers): 65.07, 62.90, 61.91, 61.97 ms.  Throughput: 63.56 iter/sec.
Timings for 8192K FFT length (4 cpus hyperthreaded, 4 workers): 77.07, 69.87, 72.68, 69.53 ms.  Throughput: 55.43 iter/sec.
petrw1 is offline   Reply With Quote
Old 2016-07-19, 16:47   #689
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

10110110012 Posts
Default

Interesting that fastest timing for 2048k versus 4096k slows down less than factor 2.
diep is offline   Reply With Quote
Old 2016-07-19, 19:32   #690
thyw
 
Feb 2016
! North_America

2×43 Posts
Default

I have a laptop, with a10-7300, Bulldozer, "4" core, 1.9 GHz, 3.2 GHz advertised turbo. Single channel ddr3 1600, 4GB.

(these "measurements" are done by running 1 w 4 thread - LLD 2400K FFT)
If i have turbocore on, in full load, HWMonitor show the clockspeed jumping from 1500 MHz to 1800 MHz. [After i wrote this, found out turbocore depends on TDP headrooms. But HWMonitor displays just under 27 W (cpu+igpu), and the "power cable" has 65 W output written on. So i don't know why it's throttling.]
If it's running idle ("empty"), it shows the freq is hovering about 2700 MHz (and sometimes up to 3200MHz), rarely 2400. Also in idle, the Voltage is higher than under load. (1.0250 V vs 0.8875/0.8250 V)
But if i turn turbocore off (by AMD OverDrive), the clockspeed stays around 1900 MHZ, and delivers *slightly* better performance (in prime95). Also i can safely low the voltage by 2 hundreth of a Volt (0.9250->0.9062). ALso while turbo off and under load, it's still having a ~27 W power, no throttling, 1900MHz stay.

turbo ON:
Code:
Compare your results to other computers at http://www.mersenne.org/report_benchmarks
AMD A10-7300 Radeon R6, 10 Compute Cores 4C+6G 
CPU speed: 2395.63 MHz, 4 cores
CPU features: 3DNow! Prefetch, SSE, SSE2, SSE4, AVX, FMA
L1 cache size: 16 KB
L2 cache size: 2 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
L1 TLBS: 64
L2 TLBS: 1024
Prime95 64-bit version 28.7, RdtscTiming=1
Best time for 1024K FFT length: 21.012 ms., avg: 21.389 ms.
Best time for 1280K FFT length: 27.735 ms., avg: 28.078 ms.
Best time for 1536K FFT length: 34.173 ms., avg: 34.768 ms.
Best time for 1792K FFT length: 41.935 ms., avg: 44.774 ms.
Best time for 2048K FFT length: 46.271 ms., avg: 47.349 ms.
Compare your results to other computers at http://www.mersenne.org/report_benchmarks
AMD A10-7300 Radeon R6, 10 Compute Cores 4C+6G 
CPU speed: 2395.84 MHz, 4 cores
CPU features: 3DNow! Prefetch, SSE, SSE2, SSE4, AVX, FMA
L1 cache size: 16 KB
L2 cache size: 2 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
L1 TLBS: 64
L2 TLBS: 1024
Prime95 64-bit version 28.7, RdtscTiming=1
Best time for 1024K FFT length: 20.487 ms., avg: 20.742 ms.
Best time for 1280K FFT length: 26.143 ms., avg: 26.432 ms.
Best time for 1536K FFT length: 32.332 ms., avg: 32.517 ms.
Best time for 1792K FFT length: 39.126 ms., avg: 39.755 ms.
Best time for 2048K FFT length: 44.009 ms., avg: 44.569 ms.
Best time for 2560K FFT length: 57.529 ms., avg: 57.710 ms.
Best time for 3072K FFT length: 72.519 ms., avg: 72.998 ms.
Best time for 3584K FFT length: 87.655 ms., avg: 88.611 ms.
Best time for 4096K FFT length: 97.382 ms., avg: 97.850 ms.
Best time for 5120K FFT length: 127.888 ms., avg: 128.580 ms.
Best time for 6144K FFT length: 154.146 ms., avg: 155.329 ms.
Best time for 7168K FFT length: 186.440 ms., avg: 187.804 ms.
Best time for 8192K FFT length: 202.272 ms., avg: 203.456 ms.
Timing FFTs using 2 threads.
Best time for 1024K FFT length: 18.111 ms., avg: 18.316 ms.
Best time for 1280K FFT length: 22.972 ms., avg: 23.453 ms.
Best time for 1536K FFT length: 27.972 ms., avg: 28.803 ms.
Best time for 1792K FFT length: 34.331 ms., avg: 34.766 ms.
Best time for 2048K FFT length: 37.691 ms., avg: 37.979 ms.
Best time for 2560K FFT length: 47.874 ms., avg: 48.734 ms.
Best time for 3072K FFT length: 59.266 ms., avg: 59.748 ms.
Best time for 3584K FFT length: 70.954 ms., avg: 71.914 ms.
Best time for 4096K FFT length: 77.935 ms., avg: 78.571 ms.
Best time for 5120K FFT length: 101.027 ms., avg: 102.357 ms.
Best time for 6144K FFT length: 125.350 ms., avg: 126.502 ms.
Best time for 7168K FFT length: 155.347 ms., avg: 157.291 ms.
Best time for 8192K FFT length: 163.312 ms., avg: 164.376 ms.
Timing FFTs using 3 threads.
Best time for 1024K FFT length: 10.854 ms., avg: 11.488 ms.
Best time for 1280K FFT length: 14.103 ms., avg: 14.569 ms.
Best time for 1536K FFT length: 17.555 ms., avg: 17.937 ms.
Best time for 1792K FFT length: 20.858 ms., avg: 21.512 ms.
[Tue Jul 19 20:27:38 2016]
Best time for 2048K FFT length: 22.865 ms., avg: 23.905 ms.
Best time for 2560K FFT length: 29.358 ms., avg: 30.094 ms.
Best time for 3072K FFT length: 36.725 ms., avg: 37.258 ms.
Best time for 3584K FFT length: 43.474 ms., avg: 44.521 ms.
Best time for 4096K FFT length: 48.532 ms., avg: 49.001 ms.
Best time for 5120K FFT length: 62.695 ms., avg: 63.910 ms.
Best time for 6144K FFT length: 76.752 ms., avg: 78.699 ms.
Best time for 7168K FFT length: 94.217 ms., avg: 95.351 ms.
Best time for 8192K FFT length: 100.110 ms., avg: 101.384 ms.
Timing FFTs using 4 threads.
Best time for 1024K FFT length: 9.770 ms., avg: 11.246 ms.
Best time for 1280K FFT length: 12.088 ms., avg: 12.931 ms.
Best time for 1536K FFT length: 14.851 ms., avg: 16.112 ms.
Best time for 1792K FFT length: 17.589 ms., avg: 19.017 ms.
Best time for 2048K FFT length: 19.822 ms., avg: 21.123 ms.
Best time for 2560K FFT length: 25.882 ms., avg: 26.596 ms.
Best time for 3072K FFT length: 32.460 ms., avg: 33.409 ms.
Best time for 3584K FFT length: 38.753 ms., avg: 39.180 ms.
Best time for 4096K FFT length: 42.714 ms., avg: 43.357 ms.
Best time for 5120K FFT length: 56.189 ms., avg: 57.169 ms.
Best time for 6144K FFT length: 69.708 ms., avg: 70.484 ms.
Best time for 7168K FFT length: 83.472 ms., avg: 85.780 ms.
Best time for 8192K FFT length: 88.411 ms., avg: 98.615 ms.

Timings for 1024K FFT length (1 cpu, 1 worker): 19.08 ms.  Throughput: 52.41 iter/sec.
Timings for 1024K FFT length (2 cpus, 2 workers): 30.83, 30.73 ms.  Throughput: 64.98 iter/sec.
Timings for 1024K FFT length (3 cpus, 3 workers): 36.45, 36.75, 24.03 ms.  Throughput: 96.26 iter/sec.
Timings for 1024K FFT length (4 cpus, 4 workers): 38.62, 38.46, 39.60, 38.32 ms.  Throughput: 103.25 iter/sec.
Timings for 1280K FFT length (1 cpu, 1 worker): 25.86 ms.  Throughput: 38.68 iter/sec.
Timings for 1280K FFT length (2 cpus, 2 workers): 40.79, 40.62 ms.  Throughput: 49.14 iter/sec.
Timings for 1280K FFT length (3 cpus, 3 workers): 45.73, 45.53, 29.42 ms.  Throughput: 77.82 iter/sec.
Timings for 1280K FFT length (4 cpus, 4 workers): 49.08, 48.89, 49.90, 48.51 ms.  Throughput: 81.49 iter/sec.
Timings for 1536K FFT length (1 cpu, 1 worker): 32.07 ms.  Throughput: 31.18 iter/sec.
Timings for 1536K FFT length (2 cpus, 2 workers): 50.01, 49.83 ms.  Throughput: 40.07 iter/sec.
[Tue Jul 19 20:32:42 2016]
Timings for 1536K FFT length (3 cpus, 3 workers): 58.05, 57.81, 37.81 ms.  Throughput: 60.97 iter/sec.
Timings for 1536K FFT length (4 cpus, 4 workers): 59.63, 59.29, 59.32, 60.33 ms.  Throughput: 67.07 iter/sec.
Timings for 1792K FFT length (1 cpu, 1 worker): 38.32 ms.  Throughput: 26.10 iter/sec.
Timings for 1792K FFT length (2 cpus, 2 workers): 60.33, 60.10 ms.  Throughput: 33.22 iter/sec.
Timings for 1792K FFT length (3 cpus, 3 workers): 70.41, 70.08, 45.48 ms.  Throughput: 50.46 iter/sec.
Timings for 1792K FFT length (4 cpus, 4 workers): 71.60, 71.37, 71.69, 72.17 ms.  Throughput: 55.78 iter/sec.
Timings for 2048K FFT length (1 cpu, 1 worker): 42.36 ms.  Throughput: 23.61 iter/sec.
Timings for 2048K FFT length (2 cpus, 2 workers): 66.20, 65.89 ms.  Throughput: 30.28 iter/sec.
Timings for 2048K FFT length (3 cpus, 3 workers): 77.35, 76.60, 49.87 ms.  Throughput: 46.04 iter/sec.
Timings for 2048K FFT length (4 cpus, 4 workers): 79.95, 78.11, 79.05, 78.54 ms.  Throughput: 50.69 iter/sec.
Timings for 2560K FFT length (1 cpu, 1 worker): 53.05 ms.  Throughput: 18.85 iter/sec.
Timings for 2560K FFT length (2 cpus, 2 workers): 83.82, 83.45 ms.  Throughput: 23.91 iter/sec.
Timings for 2560K FFT length (3 cpus, 3 workers): 99.98, 97.41, 63.41 ms.  Throughput: 36.04 iter/sec.
Timings for 2560K FFT length (4 cpus, 4 workers): 103.25, 100.95, 102.96, 101.56 ms.  Throughput: 39.15 iter/sec.
Timings for 3072K FFT length (1 cpu, 1 worker): 64.69 ms.  Throughput: 15.46 iter/sec.
Timings for 3072K FFT length (2 cpus, 2 workers): 102.04, 101.78 ms.  Throughput: 19.63 iter/sec.
Timings for 3072K FFT length (3 cpus, 3 workers): 121.09, 120.65, 78.58 ms.  Throughput: 29.27 iter/sec.
Timings for 3072K FFT length (4 cpus, 4 workers): 124.71, 126.70, 126.20, 124.88 ms.  Throughput: 31.84 iter/sec.
Timings for 3584K FFT length (1 cpu, 1 worker): 78.32 ms.  Throughput: 12.77 iter/sec.
Timings for 3584K FFT length (2 cpus, 2 workers): 124.12, 123.67 ms.  Throughput: 16.14 iter/sec.
Timings for 3584K FFT length (3 cpus, 3 workers): 145.03, 147.72, 93.75 ms.  Throughput: 24.33 iter/sec.
Timings for 3584K FFT length (4 cpus, 4 workers): 150.50, 152.99, 152.35, 150.76 ms.  Throughput: 26.38 iter/sec.
Timings for 4096K FFT length (1 cpu, 1 worker): 86.55 ms.  Throughput: 11.55 iter/sec.
[Tue Jul 19 20:37:47 2016]
Timings for 4096K FFT length (2 cpus, 2 workers): 135.66, 134.76 ms.  Throughput: 14.79 iter/sec.
Timings for 4096K FFT length (3 cpus, 3 workers): 159.01, 153.26, 102.91 ms.  Throughput: 22.53 iter/sec.
Timings for 4096K FFT length (4 cpus, 4 workers): 165.86, 165.40, 170.38, 165.40 ms.  Throughput: 23.99 iter/sec.
Timings for 5120K FFT length (1 cpu, 1 worker): 110.98 ms.  Throughput:  9.01 iter/sec.
Timings for 5120K FFT length (2 cpus, 2 workers): 176.28, 175.66 ms.  Throughput: 11.37 iter/sec.
Timings for 5120K FFT length (3 cpus, 3 workers): 208.70, 207.80, 135.28 ms.  Throughput: 17.00 iter/sec.
Timings for 5120K FFT length (4 cpus, 4 workers): 215.23, 219.65, 221.52, 218.94 ms.  Throughput: 18.28 iter/sec.
Timings for 6144K FFT length (1 cpu, 1 worker): 133.68 ms.  Throughput:  7.48 iter/sec.
Timings for 6144K FFT length (2 cpus, 2 workers): 217.40, 216.90 ms.  Throughput:  9.21 iter/sec.
Timings for 6144K FFT length (3 cpus, 3 workers): 254.20, 255.33, 162.20 ms.  Throughput: 14.02 iter/sec.
Timings for 6144K FFT length (4 cpus, 4 workers): 269.72, 273.19, 273.72, 277.08 ms.  Throughput: 14.63 iter/sec.
Timings for 7168K FFT length (1 cpu, 1 worker): 159.99 ms.  Throughput:  6.25 iter/sec.
Timings for 7168K FFT length (2 cpus, 2 workers): 269.08, 268.61 ms.  Throughput:  7.44 iter/sec.
Timings for 7168K FFT length (3 cpus, 3 workers): 328.05, 323.66, 198.59 ms.  Throughput: 11.17 iter/sec.
Timings for 7168K FFT length (4 cpus, 4 workers): 341.18, 330.05, 381.76, 364.24 ms.  Throughput: 11.33 iter/sec.
Timings for 8192K FFT length (1 cpu, 1 worker): 176.12 ms.  Throughput:  5.68 iter/sec.
Timings for 8192K FFT length (2 cpus, 2 workers): 294.70, 291.70 ms.  Throughput:  6.82 iter/sec.
Timings for 8192K FFT length (3 cpus, 3 workers): 366.75, 377.89, 214.39 ms.  Throughput: 10.04 iter/sec.
Timings for 8192K FFT length (4 cpus, 4 workers): 394.19, 400.36, 400.10, 396.22 ms.  Throughput: 10.06 iter/sec.
turbo OFF:
http://pastebin.com/CBgMUUEh
//character limit

Plus it may be placebo, but in the "Worker Windows", if CPU affinity is set to "run on any CPU", seem to have better times than "CPU #1". For 4 multithreads.

Last fiddled with by thyw on 2016-07-19 at 19:35
thyw is offline   Reply With Quote
Old 2016-07-20, 13:27   #691
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

5·17·97 Posts
Default

Quote:
Originally Posted by petrw1 View Post
…12 GB of RAM…
It sounds like an 8GB and 4GB stick. Maybe it is not in dual-channel mode due to mismatched banks?
Xyzzy is offline   Reply With Quote
Old 2016-07-20, 14:13   #692
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

2×5×293 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
It sounds like an 8GB and 4GB stick. Maybe it is not in dual-channel mode due to mismatched banks?
Indeed.
Mark Rose is offline   Reply With Quote
Old 2016-07-20, 15:15   #693
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

111278 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
It sounds like an 8GB and 4GB stick. Maybe it is not in dual-channel mode due to mismatched banks?
To clarify does that mean that both sticks are on the same side of the memory bank instead of 1 on each side?
petrw1 is offline   Reply With Quote
Old 2016-07-20, 17:49   #694
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

2·5·293 Posts
Default

Quote:
Originally Posted by petrw1 View Post
To clarify does that mean that both sticks are on the same side of the memory bank instead of 1 on each side?
If both memory channels don't have the same amount of memory, then the memory access can't be interleaved between the two channels.
Mark Rose is offline   Reply With Quote
Old 2016-07-20, 18:18   #695
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

10010010101112 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
If both memory channels don't have the same amount of memory, then the memory access can't be interleaved between the two channels.
What if he has 2 - 4's in 1 bank and 1 - 4 in the other?
Is he best to simply remove one of the pair of 4's?

Could I have 2-4'x on 1 side and an 8 on the other?
Do all the sticks have to have the same specs?
...I mean for interleaving to be supported and work effectively?

Last fiddled with by petrw1 on 2016-07-20 at 19:00 Reason: Last paragraph
petrw1 is offline   Reply With Quote
Old 2016-07-20, 20:59   #696
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

293010 Posts
Default

I can't say for certain, but I'd only put them in slots like this:

4 4 x x
8 8 x x
8 8 4 4

OR

4 x 4 x
8 x 8 x
8 4 8 4

It depends on your motherboard, if slots 1 and 2 or 1 and 3 are part of the same channel. You'll have to check the manual.
Mark Rose is offline   Reply With Quote
Old 2016-07-21, 02:36   #697
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2×3×1,693 Posts
Default

Channels are frequently color-coded on the RAM slots.
kladner is offline   Reply With Quote
Old 2016-07-21, 04:53   #698
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2×3×1,693 Posts
Default

weirdness. double post. I didn't think the above had gone through.

Last fiddled with by kladner on 2016-07-21 at 04:54
kladner is offline   Reply With Quote
Old 2016-07-21, 15:30   #699
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

10010010101112 Posts
Default

If the computer / mother board came with DDR3 can I replace it with DDR4?
I notice they have different Pin Counts

If it helps it is this OEM PC:

Acer Aspire T3 Gaming PC (Intel Ci7-6700)

Last fiddled with by petrw1 on 2016-07-21 at 15:33
petrw1 is offline   Reply With Quote
Old 2016-07-21, 15:42   #700
lalera
 
lalera's Avatar
 
Jul 2003

13×47 Posts
Default

Quote:
Originally Posted by petrw1 View Post
If the computer / mother board came with DDR3 can I replace it with DDR4?
I notice they have different Pin Counts

If it helps it is this OEM PC:

Acer Aspire T3 Gaming PC (Intel Ci7-6700)
hi,
no, you can not replace the ddr3-dimms with ddr4-dimms
(exeption: some mainboards have both slots but they are usually not good)

Last fiddled with by lalera on 2016-07-21 at 15:46
lalera is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Perpetual "interesting video" thread... Xyzzy Lounge 43 2021-07-17 00:00
LLR benchmark thread Oddball Riesel Prime Search 5 2010-08-02 00:11
Perpetual I'm pi**ed off thread rogue Soap Box 19 2009-10-28 19:17
Perpetual autostereogram thread... Xyzzy Lounge 10 2006-09-28 00:36
Perpetual ECM factoring challenge thread... Xyzzy Factoring 65 2005-09-05 08:16

All times are UTC. The time now is 21:51.


Fri Aug 6 21:51:19 UTC 2021 up 14 days, 16:20, 1 user, load averages: 2.68, 2.54, 2.51

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.