![]() |
|
|
#651 | |
|
Serpentine Vermin Jar
Jul 2014
CF116 Posts |
Quote:
That's another thing where George might get some optimizations, by targeting the chunk of data being worked on to the L3 cache size of that core, be it 1.5 MB, 2 MB, 2.5 MB etc. As in, would it be faster to do several smaller chunks of work that could fit in cache, or one larger chunk that would have to, by necessity, go out to main RAM? |
|
|
|
|
|
|
#652 | |
|
Aug 2002
101011102 Posts |
Quote:
Should've got the 2696v3/2699v3 with 45MB L3 then
|
|
|
|
|
|
|
#653 |
|
Jun 2003
508710 Posts |
|
|
|
|
|
|
#654 | |
|
Serpentine Vermin Jar
Jul 2014
1100111100012 Posts |
Quote:
Things like L2/L3 cache sizes, memory speed, FFT sizes, # of cores (threads) per worker, total # of workers, whether you have single or dual+ socket motherboard, etc. They all play a part and if you have the time and persistence you can figure out what works best for you, but I think the project would benefit from an automated "plug and play" configuration. I've said it before and I'll say it again, I've seen obvious cases of horribly misconfigured systems that are doing multiple 4M+ FFT tests on a single CPU, and I just know it would be orders of magnitude more efficient if it did one worker using all cores. Those are the systems where I can see it has 4 LL tests assigned to it and it's reporting results daily showing minimal progress. Either they only run it an hour a day, or the memory contention is as bad as I think it is, making all of them slow as molasses. |
|
|
|
|
|
|
#655 | |
|
Serpentine Vermin Jar
Jul 2014
331310 Posts |
Quote:
In the case of the 16/18 core Haswell chips, they are clocked slightly slower than 14 core, so I guess it would be slightly faster. AirSquirrels proved this by doing a check on the same exponent as me... him with dual 16-core Xeons, me with dual 14-core. In my case I could only get 20-22 cores (14 on one chip, the rest on the other) before I started to see that it wasn't improving the performance, and actually started hindering it. In his case, he had all 32 cores working together and says he didn't see any drop in speed, but I don't know if he tested other total worker counts like 24, 26, 28, 30, etc. In the end, what took me 34 hours took him around 32 or something (it was the verification for M49). Looks like Broadwell E5 will have the same 2.5 MB L3 cache per core... I haven't see any hard info on the L3 size (per core) on Skylake, but probably still 2.5 MB unless they surprise us all with 3-4 MB. I'm still waiting for Knights Landing and it's up-to-16GB of fast memory, which I guess you could call L4 if the system is configured to use that as a fast cache to main RAM. Again, no hard numbers on how the KNL memory bandwidth would compete with the L3 speed we enjoy now, but it would be faster than 6-channel DDR4, and that's a very good thing. |
|
|
|
|
|
|
#656 |
|
Aug 2002
AE16 Posts |
I ran the benchmark again focusing on 4096K+ FFT. It takes an incredibly long time to run and most of the time running different combos of workers and threads. It would be good if George would consider the following.
For my initial results, even at 8192K FFT the 2697v3 is definitely preferring 1w/14c rather than 14w/14c. However the advantage between 1w/14c vs 14w/14c is smaller compared to 4096K Last fiddled with by xtreme2k on 2016-03-01 at 23:13 |
|
|
|
|
|
#657 |
|
Mar 2006
48010 Posts |
Benchmarks for my E5-2687W v4 (12 cores, 3.0GHz) with 128GB = 8x 16GB DDR4-2133 ECC Reg CL15 on Windows 7 Pro SP1 x64.
I'm thinking of rerunning the test with 64GB = 4x 16GB DDR4-2133, to see if a single DIMM per channel makes a difference. In these initial tests, I did see a couple of oddly high timings pop up. I think this may have happened because it hit the second DIMM on the memory channel (maybe?). The 1st benchmark was run after downloading Prime95 28.9 and selecting Benchmark from the menu. The 2nd benchmark used the following in prime.txt: FullBench=1 The 3rd benchmark used the following in prime.txt: MinBenchFFT=4096 MaxBenchFFT=4096 BenchHyperthreads=0 BenchMultithreads=1 The 4th benchmark used the following in prime.txt: MinBenchFFT=8192 MaxBenchFFT=8192 BenchHyperthreads=0 BenchMultithreads=1 The 5th benchmark used the following in prime.txt: FullBench=1 BenchHyperthreads=0 BenchMultithreads=1 I know this is a very small part of the program, but I think it would be very helpful if the Benchmark menu option would bring up an options dialog. With this you can quickly choose from the above options and perhaps save the results to different files. Maybe the dialog can contain: A radio button to select between "Standard Benchmark", "Full Benchmark", and maybe "Custom Benchmark" A check box to use MinBenchFFT, which gets its input from a drop down menu. (I know this would be large with 127 options, but better than error checking user inputs) A check box to use MaxBenchFFT, which gets its input from a drop down menu. A check box to bench hyperthreads, if that is a) available for the processor and b) turned on in BIOS A check box to bench multithreads, if the processor has multiple cores. Perhaps a way to specify the affinity map to see how that affects benchmarks. Perhaps a way to specify a file name to save these benchmark results to a separate file. And controls for any other benchmark options I don't know about. Last fiddled with by WraithX on 2016-06-12 at 14:04 |
|
|
|
|
|
#658 | |
|
Serpentine Vermin Jar
Jul 2014
63618 Posts |
Quote:
The differences will be the architecture itself (Broadwell) plus the v4 will have DDR4-2400 instead of DDR4-2133 for the v3. The mem speed alone would make a big difference. Any reason you're using 2133 instead of 2400 memory? The E5-2687W v4 supports up to 2400, and since memory is the bottleneck with Prime95, you might take a look at that. Depending on your motherboard, it may or may not reduce the clock on the memory if you're running 2 DPC. HP servers are "fun" in that if you're using official HP memory, you can run full speed with 2 DPC, but if you cheap out and use 3rd party memory, the BIOS will step down the memory speed from, for example, 1866 to 1600. That might not hold true for the gen9 boxes, but it's definitely the case on their gen8 Proliant servers with Xeon E5 v1/v2 processors. It can also vary depending on the ranks of each module. I note with curiosity that on the new server I'm getting, it'll run 2DPC @ 2400 MHz if the modules are dual ranked 16GB, but if you use the single-ranked 16GB modules, it runs @ 2133 with 2DPC. Doesn't matter to me, I'm only doing one DPC, so 8 across both CPUs, but it's curious. It even says I could use load reduced DIMMs (LRDIMMs) and get 3DPC at the full 2400. I'm imagining a system now with a full 24 x 128GB LRDIMM @ 2400 for a total of 3 TB of RAM. Pair that with a couple of 22-core E5-2699v4 and you've got yourself one heckuva virtual host machine... Or better yet, 4 of those CPUs once the E5-46xx v4 chips come out.
|
|
|
|
|
|
|
#659 |
|
Sep 2003
A1916 Posts |
How many watts does one of these use when it's doing LL tests at full capacity? Not including the additional air-conditioning burden, which may be hard to calculate directly.
|
|
|
|
|
|
#660 |
|
"/X\(‘-‘)/X\"
Jan 2013
2×5×293 Posts |
The processor has a design power of 135 watts. Throw in some memory and chipset watts and powersupply inefficiency and you're probably close to 175 watts per processor.
|
|
|
|
|
|
#661 |
|
"David"
Jul 2015
Ohio
10058 Posts |
My 2011v3 16 cores have a 135W TDP, but the entire system power consumption goes up only110W from idle when I ramp up P95.
|
|
|
|
|
|
#662 | ||
|
Mar 2006
25·3·5 Posts |
Quote:
Someday I'll save up enough to make this a dual 2687Wv4.Quote:
Whoa, 3TB of ram, that would be amazing! Btw, I wouldn't make it VM host, I'd just run one OS and fill it up with ECM jobs!
|
||
|
|
|
|
|
#663 | |
|
Aug 2002
101011102 Posts |
Quote:
![]() All I need is the 4096K 1/14t times
|
|
|
|
|
|
|
#664 |
|
"/X\(‘-‘)/X\"
Jan 2013
B7216 Posts |
So I ran some quick benchmarks:
c3.large: [Work thread Jun 16 14:02] Timing 34 iterations of 1024K FFT length. Best time: 6.853 ms., avg time: 6.865 ms. [Work thread Jun 16 14:02] Timing 27 iterations of 1280K FFT length. Best time: 8.677 ms., avg time: 8.706 ms. [Work thread Jun 16 14:02] Timing 25 iterations of 1536K FFT length. Best time: 10.443 ms., avg time: 10.483 ms. [Work thread Jun 16 14:02] Timing 25 iterations of 1792K FFT length. Best time: 12.547 ms., avg time: 12.623 ms. [Work thread Jun 16 14:02] Timing 25 iterations of 2048K FFT length. Best time: 14.705 ms., avg time: 14.766 ms. [Work thread Jun 16 14:02] Timing 25 iterations of 2560K FFT length. Best time: 17.644 ms., avg time: 17.709 ms. [Work thread Jun 16 14:02] Timing 25 iterations of 3072K FFT length. Best time: 21.346 ms., avg time: 21.412 ms. [Work thread Jun 16 14:02] Timing 25 iterations of 3584K FFT length. Best time: 25.685 ms., avg time: 25.720 ms. [Work thread Jun 16 14:02] Timing 25 iterations of 4096K FFT length. Best time: 32.316 ms., avg time: 32.410 ms. [Work thread Jun 16 14:02] Timing 25 iterations of 5120K FFT length. Best time: 40.270 ms., avg time: 40.385 ms. [Work thread Jun 16 14:02] Timing 25 iterations of 6144K FFT length. Best time: 48.339 ms., avg time: 48.436 ms. [Work thread Jun 16 14:02] Timing 25 iterations of 7168K FFT length. Best time: 57.406 ms., avg time: 57.462 ms. [Work thread Jun 16 14:02] Timing 25 iterations of 8192K FFT length. Best time: 67.754 ms., avg time: 67.854 ms. [Work thread Jun 16 14:02] Timing FFTs using 2 threads on 1 physical CPU. [Work thread Jun 16 14:02] Timing 34 iterations of 1024K FFT length. Best time: 7.234 ms., avg time: 7.271 ms. [Work thread Jun 16 14:02] Timing 27 iterations of 1280K FFT length. Best time: 9.249 ms., avg time: 9.329 ms. [Work thread Jun 16 14:02] Timing 25 iterations of 1536K FFT length. Best time: 11.145 ms., avg time: 11.171 ms. [Work thread Jun 16 14:02] Timing 25 iterations of 1792K FFT length. Best time: 13.310 ms., avg time: 13.389 ms. [Work thread Jun 16 14:02] Timing 25 iterations of 2048K FFT length. Best time: 15.715 ms., avg time: 15.935 ms. [Work thread Jun 16 14:02] Timing 25 iterations of 2560K FFT length. Best time: 19.288 ms., avg time: 19.469 ms. [Work thread Jun 16 14:03] Timing 25 iterations of 3072K FFT length. Best time: 24.058 ms., avg time: 24.236 ms. [Work thread Jun 16 14:03] Timing 25 iterations of 3584K FFT length. Best time: 30.966 ms., avg time: 33.027 ms. [Work thread Jun 16 14:03] Timing 25 iterations of 4096K FFT length. Best time: 32.997 ms., avg time: 33.571 ms. [Work thread Jun 16 14:03] Timing 25 iterations of 5120K FFT length. Best time: 41.625 ms., avg time: 42.549 ms. [Work thread Jun 16 14:03] Timing 25 iterations of 6144K FFT length. Best time: 50.459 ms., avg time: 50.620 ms. [Work thread Jun 16 14:03] Timing 25 iterations of 7168K FFT length. Best time: 59.329 ms., avg time: 59.725 ms. [Work thread Jun 16 14:03] Timing 25 iterations of 8192K FFT length. Best time: 71.724 ms., avg time: 72.220 ms. c4.large: [Work thread Jun 16 14:02] Timing 36 iterations of 1024K FFT length. Best time: 4.640 ms., avg time: 4.649 ms. [Work thread Jun 16 14:02] Timing 29 iterations of 1280K FFT length. Best time: 5.899 ms., avg time: 5.908 ms. [Work thread Jun 16 14:02] Timing 25 iterations of 1536K FFT length. Best time: 7.101 ms., avg time: 7.116 ms. [Work thread Jun 16 14:02] Timing 25 iterations of 1792K FFT length. Best time: 8.769 ms., avg time: 8.781 ms. [Work thread Jun 16 14:02] Timing 25 iterations of 2048K FFT length. Best time: 9.593 ms., avg time: 9.661 ms. [Work thread Jun 16 14:02] Timing 25 iterations of 2560K FFT length. Best time: 12.410 ms., avg time: 12.563 ms. [Work thread Jun 16 14:02] Timing 25 iterations of 3072K FFT length. Best time: 15.232 ms., avg time: 15.350 ms. [Work thread Jun 16 14:02] Timing 25 iterations of 3584K FFT length. Best time: 18.626 ms., avg time: 18.801 ms. [Work thread Jun 16 14:02] Timing 25 iterations of 4096K FFT length. Best time: 21.407 ms., avg time: 21.609 ms. [Work thread Jun 16 14:02] Timing 25 iterations of 5120K FFT length. Best time: 27.879 ms., avg time: 27.976 ms. [Work thread Jun 16 14:02] Timing 25 iterations of 6144K FFT length. Best time: 33.080 ms., avg time: 33.177 ms. [Work thread Jun 16 14:02] Timing 25 iterations of 7168K FFT length. Best time: 39.618 ms., avg time: 39.721 ms. [Work thread Jun 16 14:02] Timing 25 iterations of 8192K FFT length. Best time: 46.036 ms., avg time: 46.128 ms. [Work thread Jun 16 14:02] Timing FFTs using 2 threads on 1 physical CPU. [Work thread Jun 16 14:02] Timing 36 iterations of 1024K FFT length. Best time: 4.547 ms., avg time: 4.571 ms. [Work thread Jun 16 14:02] Timing 29 iterations of 1280K FFT length. Best time: 5.721 ms., avg time: 5.767 ms. [Work thread Jun 16 14:02] Timing 25 iterations of 1536K FFT length. Best time: 6.995 ms., avg time: 7.050 ms. [Work thread Jun 16 14:02] Timing 25 iterations of 1792K FFT length. Best time: 8.491 ms., avg time: 8.570 ms. [Work thread Jun 16 14:02] Timing 25 iterations of 2048K FFT length. Best time: 10.148 ms., avg time: 10.372 ms. [Work thread Jun 16 14:02] Timing 25 iterations of 2560K FFT length. Best time: 12.506 ms., avg time: 12.931 ms. [Work thread Jun 16 14:02] Timing 25 iterations of 3072K FFT length. Best time: 15.535 ms., avg time: 15.826 ms. [Work thread Jun 16 14:02] Timing 25 iterations of 3584K FFT length. Best time: 19.055 ms., avg time: 19.465 ms. [Work thread Jun 16 14:03] Timing 25 iterations of 4096K FFT length. Best time: 21.606 ms., avg time: 22.175 ms. [Work thread Jun 16 14:03] Timing 25 iterations of 5120K FFT length. Best time: 28.121 ms., avg time: 28.367 ms. [Work thread Jun 16 14:03] Timing 25 iterations of 6144K FFT length. Best time: 36.025 ms., avg time: 36.362 ms. [Work thread Jun 16 14:03] Timing 25 iterations of 7168K FFT length. Best time: 42.364 ms., avg time: 43.682 ms. [Work thread Jun 16 14:03] Timing 25 iterations of 8192K FFT length. Best time: 49.345 ms., avg time: 50.905 ms. A 4770 at stock clocks with 1600 MHz RAM: [Work thread Jun 16 10:11] Timing 44 iterations of 1024K FFT length. Best time: 4.281 ms., avg time: 4.897 ms. [Work thread Jun 16 10:11] Timing 35 iterations of 1280K FFT length. Best time: 5.537 ms., avg time: 5.642 ms. [Work thread Jun 16 10:11] Timing 29 iterations of 1536K FFT length. Best time: 6.675 ms., avg time: 6.781 ms. [Work thread Jun 16 10:11] Timing 25 iterations of 1792K FFT length. Best time: 8.151 ms., avg time: 8.184 ms. [Work thread Jun 16 10:11] Timing 25 iterations of 2048K FFT length. Best time: 9.124 ms., avg time: 9.188 ms. [Work thread Jun 16 10:11] Timing 25 iterations of 2560K FFT length. Best time: 11.693 ms., avg time: 15.343 ms. [Work thread Jun 16 10:11] Timing 25 iterations of 3072K FFT length. Best time: 14.091 ms., avg time: 14.156 ms. [Work thread Jun 16 10:11] Timing 25 iterations of 3584K FFT length. Best time: 16.791 ms., avg time: 17.066 ms. [Work thread Jun 16 10:11] Timing 25 iterations of 4096K FFT length. Best time: 19.295 ms., avg time: 24.632 ms. [Work thread Jun 16 10:11] Timing 25 iterations of 5120K FFT length. Best time: 24.642 ms., avg time: 24.689 ms. [Work thread Jun 16 10:11] Timing 25 iterations of 6144K FFT length. Best time: 29.367 ms., avg time: 33.022 ms. [Work thread Jun 16 10:11] Timing 25 iterations of 7168K FFT length. Best time: 34.989 ms., avg time: 37.109 ms. [Work thread Jun 16 10:11] Timing 25 iterations of 8192K FFT length. Best time: 40.438 ms., avg time: 40.463 ms. [Work thread Jun 16 10:11] Timing FFTs using 2 threads on 1 physical CPU. [Work thread Jun 16 10:11] Timing 44 iterations of 1024K FFT length. Best time: 4.140 ms., avg time: 4.159 ms. [Work thread Jun 16 10:11] Timing 35 iterations of 1280K FFT length. Best time: 5.422 ms., avg time: 5.643 ms. [Work thread Jun 16 10:11] Timing 29 iterations of 1536K FFT length. Best time: 6.603 ms., avg time: 6.682 ms. [Work thread Jun 16 10:11] Timing 25 iterations of 1792K FFT length. Best time: 7.751 ms., avg time: 7.832 ms. [Work thread Jun 16 10:11] Timing 25 iterations of 2048K FFT length. Best time: 9.457 ms., avg time: 9.598 ms. [Work thread Jun 16 10:11] Timing 25 iterations of 2560K FFT length. Best time: 11.483 ms., avg time: 11.732 ms. [Work thread Jun 16 10:11] Timing 25 iterations of 3072K FFT length. Best time: 13.747 ms., avg time: 14.136 ms. [Work thread Jun 16 10:11] Timing 25 iterations of 3584K FFT length. Best time: 16.652 ms., avg time: 17.001 ms. [Work thread Jun 16 10:11] Timing 25 iterations of 4096K FFT length. Best time: 19.166 ms., avg time: 19.413 ms. [Work thread Jun 16 10:11] Timing 25 iterations of 5120K FFT length. Best time: 24.249 ms., avg time: 24.386 ms. [Work thread Jun 16 10:11] Timing 25 iterations of 6144K FFT length. Best time: 30.548 ms., avg time: 31.100 ms. [Work thread Jun 16 10:11] Timing 25 iterations of 7168K FFT length. Best time: 36.463 ms., avg time: 36.643 ms. [Work thread Jun 16 10:11] Timing 25 iterations of 8192K FFT length. Best time: 41.483 ms., avg time: 42.415 ms. And a 4770k at stock clocks with 2400 MHz RAM: [Work thread Jun 16 10:06] Timing 43 iterations of 1024K FFT length. Best time: 3.991 ms., avg time: 4.131 ms. [Work thread Jun 16 10:06] Timing 34 iterations of 1280K FFT length. Best time: 5.160 ms., avg time: 5.196 ms. [Work thread Jun 16 10:06] Timing 29 iterations of 1536K FFT length. Best time: 6.167 ms., avg time: 6.213 ms. [Work thread Jun 16 10:06] Timing 25 iterations of 1792K FFT length. Best time: 7.617 ms., avg time: 7.806 ms. [Work thread Jun 16 10:06] Timing 25 iterations of 2048K FFT length. Best time: 8.758 ms., avg time: 8.876 ms. [Work thread Jun 16 10:06] Timing 25 iterations of 2560K FFT length. Best time: 10.741 ms., avg time: 10.937 ms. [Work thread Jun 16 10:06] Timing 25 iterations of 3072K FFT length. Best time: 12.947 ms., avg time: 13.238 ms. [Work thread Jun 16 10:06] Timing 25 iterations of 3584K FFT length. Best time: 15.444 ms., avg time: 15.773 ms. [Work thread Jun 16 10:06] Timing 25 iterations of 4096K FFT length. Best time: 17.717 ms., avg time: 17.869 ms. [Work thread Jun 16 10:06] Timing 25 iterations of 5120K FFT length. Best time: 22.808 ms., avg time: 23.032 ms. [Work thread Jun 16 10:06] Timing 25 iterations of 6144K FFT length. Best time: 27.089 ms., avg time: 27.559 ms. [Work thread Jun 16 10:06] Timing 25 iterations of 7168K FFT length. Best time: 31.962 ms., avg time: 32.342 ms. [Work thread Jun 16 10:06] Timing 25 iterations of 8192K FFT length. Best time: 37.185 ms., avg time: 37.336 ms. [Work thread Jun 16 10:06] Timing FFTs using 2 threads on 1 physical CPU. [Work thread Jun 16 10:06] Timing 43 iterations of 1024K FFT length. Best time: 3.879 ms., avg time: 3.929 ms. [Work thread Jun 16 10:06] Timing 34 iterations of 1280K FFT length. Best time: 4.962 ms., avg time: 5.006 ms. [Work thread Jun 16 10:06] Timing 29 iterations of 1536K FFT length. Best time: 6.096 ms., avg time: 6.119 ms. [Work thread Jun 16 10:06] Timing 25 iterations of 1792K FFT length. Best time: 7.345 ms., avg time: 7.377 ms. [Work thread Jun 16 10:06] Timing 25 iterations of 2048K FFT length. Best time: 8.563 ms., avg time: 8.803 ms. [Work thread Jun 16 10:06] Timing 25 iterations of 2560K FFT length. Best time: 10.408 ms., avg time: 10.574 ms. [Work thread Jun 16 10:06] Timing 25 iterations of 3072K FFT length. Best time: 12.569 ms., avg time: 12.814 ms. [Work thread Jun 16 10:06] Timing 25 iterations of 3584K FFT length. Best time: 15.082 ms., avg time: 15.255 ms. [Work thread Jun 16 10:06] Timing 25 iterations of 4096K FFT length. Best time: 17.042 ms., avg time: 17.344 ms. [Work thread Jun 16 10:06] Timing 25 iterations of 5120K FFT length. Best time: 22.032 ms., avg time: 22.206 ms. [Work thread Jun 16 10:06] Timing 25 iterations of 6144K FFT length. Best time: 27.740 ms., avg time: 28.096 ms. [Work thread Jun 16 10:06] Timing 25 iterations of 7168K FFT length. Best time: 32.233 ms., avg time: 32.927 ms. [Work thread Jun 16 10:06] Timing 25 iterations of 8192K FFT length. Best time: 37.702 ms., avg time: 38.482 ms. This doesn't say much that we didn't already know. The c3.large and c4.large have two hyperthreads of a single CPU core. The E5-2666v3 in the c4.large has a turbo of 3.3 GHz, but /proc/cpuinfo shows it running at 2.9 GHz. The E5-2680v2 in the c3.large has a turbo of 3.6 GHz, but /proc/cpuinfo shows 2.8 GHz, and it also lacks AVX2. So a c4.large actually does pretty well for its clock speed with little virtualization overhead. |
|
|
|
|
|
#665 |
|
"/X\(‘-‘)/X\"
Jan 2013
55628 Posts |
Intel i3-4170 @ 3.7 GHz with DDR3-1333
[Work thread Jun 18 23:41] Timing 46 iterations of 1024K FFT length. Best time: 4.383 ms., avg time: 4.397 ms. [Work thread Jun 18 23:41] Timing 36 iterations of 1280K FFT length. Best time: 5.575 ms., avg time: 5.584 ms. [Work thread Jun 18 23:41] Timing 30 iterations of 1536K FFT length. Best time: 6.665 ms., avg time: 6.679 ms. [Work thread Jun 18 23:41] Timing 26 iterations of 1792K FFT length. Best time: 8.349 ms., avg time: 8.662 ms. [Work thread Jun 18 23:41] Timing 25 iterations of 2048K FFT length. Best time: 9.112 ms., avg time: 9.130 ms. [Work thread Jun 18 23:41] Timing 25 iterations of 2560K FFT length. Best time: 11.567 ms., avg time: 11.588 ms. [Work thread Jun 18 23:41] Timing 25 iterations of 3072K FFT length. Best time: 13.935 ms., avg time: 13.950 ms. [Work thread Jun 18 23:41] Timing 25 iterations of 3584K FFT length. Best time: 16.531 ms., avg time: 16.549 ms. [Work thread Jun 18 23:41] Timing 25 iterations of 4096K FFT length. Best time: 19.045 ms., avg time: 19.069 ms. [Work thread Jun 18 23:41] Timing 25 iterations of 5120K FFT length. Best time: 24.178 ms., avg time: 24.200 ms. [Work thread Jun 18 23:41] Timing 25 iterations of 6144K FFT length. Best time: 28.969 ms., avg time: 28.996 ms. [Work thread Jun 18 23:41] Timing 25 iterations of 7168K FFT length. Best time: 34.644 ms., avg time: 34.673 ms. [Work thread Jun 18 23:41] Timing 25 iterations of 8192K FFT length. Best time: 39.996 ms., avg time: 40.030 ms. [Work thread Jun 18 23:41] Timing FFTs using 2 threads on 1 physical CPU. [Work thread Jun 18 23:41] Timing 46 iterations of 1024K FFT length. Best time: 4.590 ms., avg time: 4.607 ms. [Work thread Jun 18 23:41] Timing 36 iterations of 1280K FFT length. Best time: 5.864 ms., avg time: 5.898 ms. [Work thread Jun 18 23:41] Timing 30 iterations of 1536K FFT length. Best time: 7.092 ms., avg time: 7.133 ms. [Work thread Jun 18 23:41] Timing 26 iterations of 1792K FFT length. Best time: 8.087 ms., avg time: 8.183 ms. [Work thread Jun 18 23:41] Timing 25 iterations of 2048K FFT length. Best time: 9.698 ms., avg time: 9.780 ms. [Work thread Jun 18 23:41] Timing 25 iterations of 2560K FFT length. Best time: 11.710 ms., avg time: 11.870 ms. [Work thread Jun 18 23:41] Timing 25 iterations of 3072K FFT length. Best time: 14.284 ms., avg time: 14.384 ms. [Work thread Jun 18 23:41] Timing 25 iterations of 3584K FFT length. Best time: 17.137 ms., avg time: 17.236 ms. [Work thread Jun 18 23:41] Timing 25 iterations of 4096K FFT length. Best time: 19.351 ms., avg time: 19.594 ms. [Work thread Jun 18 23:41] Timing 25 iterations of 5120K FFT length. Best time: 24.615 ms., avg time: 24.733 ms. [Work thread Jun 18 23:41] Timing 25 iterations of 6144K FFT length. Best time: 30.627 ms., avg time: 30.673 ms. [Work thread Jun 18 23:42] Timing 25 iterations of 7168K FFT length. Best time: 36.243 ms., avg time: 36.295 ms. [Work thread Jun 18 23:42] Timing 25 iterations of 8192K FFT length. Best time: 42.296 ms., avg time: 42.369 ms. [Work thread Jun 18 23:42] Timing FFTs using 2 threads on 2 physical CPUs. [Work thread Jun 18 23:42] Timing 46 iterations of 1024K FFT length. Best time: 2.774 ms., avg time: 2.800 ms. [Work thread Jun 18 23:42] Timing 36 iterations of 1280K FFT length. Best time: 3.636 ms., avg time: 3.658 ms. [Work thread Jun 18 23:42] Timing 30 iterations of 1536K FFT length. Best time: 3.753 ms., avg time: 3.815 ms. [Work thread Jun 18 23:42] Timing 26 iterations of 1792K FFT length. Best time: 4.565 ms., avg time: 4.680 ms. [Work thread Jun 18 23:42] Timing 25 iterations of 2048K FFT length. Best time: 5.184 ms., avg time: 5.226 ms. [Work thread Jun 18 23:42] Timing 25 iterations of 2560K FFT length. Best time: 6.801 ms., avg time: 6.818 ms. [Work thread Jun 18 23:42] Timing 25 iterations of 3072K FFT length. Best time: 8.201 ms., avg time: 8.223 ms. [Work thread Jun 18 23:42] Timing 25 iterations of 3584K FFT length. Best time: 9.548 ms., avg time: 9.641 ms. [Work thread Jun 18 23:42] Timing 25 iterations of 4096K FFT length. Best time: 10.746 ms., avg time: 10.757 ms. [Work thread Jun 18 23:42] Timing 25 iterations of 5120K FFT length. Best time: 13.467 ms., avg time: 13.509 ms. [Work thread Jun 18 23:42] Timing 25 iterations of 6144K FFT length. Best time: 16.185 ms., avg time: 16.287 ms. [Work thread Jun 18 23:42] Timing 25 iterations of 7168K FFT length. Best time: 19.323 ms., avg time: 19.439 ms. [Work thread Jun 18 23:42] Timing 25 iterations of 8192K FFT length. Best time: 22.326 ms., avg time: 22.443 ms. [Work thread Jun 18 23:42] Timing FFTs using 4 threads on 2 physical CPUs. [Work thread Jun 18 23:42] Timing 46 iterations of 1024K FFT length. Best time: 2.793 ms., avg time: 2.822 ms. [Work thread Jun 18 23:42] Timing 36 iterations of 1280K FFT length. Best time: 3.612 ms., avg time: 3.633 ms. [Work thread Jun 18 23:42] Timing 30 iterations of 1536K FFT length. Best time: 4.079 ms., avg time: 4.153 ms. [Work thread Jun 18 23:42] Timing 26 iterations of 1792K FFT length. Best time: 4.796 ms., avg time: 4.827 ms. [Work thread Jun 18 23:42] Timing 25 iterations of 2048K FFT length. Best time: 5.408 ms., avg time: 5.457 ms. [Work thread Jun 18 23:42] Timing 25 iterations of 2560K FFT length. Best time: 6.644 ms., avg time: 6.709 ms. [Work thread Jun 18 23:42] Timing 25 iterations of 3072K FFT length. Best time: 8.018 ms., avg time: 8.104 ms. [Work thread Jun 18 23:42] Timing 25 iterations of 3584K FFT length. Best time: 9.744 ms., avg time: 9.861 ms. [Work thread Jun 18 23:42] Timing 25 iterations of 4096K FFT length. Best time: 10.767 ms., avg time: 10.864 ms. [Work thread Jun 18 23:42] Timing 25 iterations of 5120K FFT length. Best time: 13.628 ms., avg time: 13.740 ms. [Work thread Jun 18 23:42] Timing 25 iterations of 6144K FFT length. Best time: 17.180 ms., avg time: 17.329 ms. [Work thread Jun 18 23:42] Timing 25 iterations of 7168K FFT length. Best time: 19.999 ms., avg time: 20.658 ms. [Work thread Jun 18 23:42] Timing 25 iterations of 8192K FFT length. Best time: 23.422 ms., avg time: 23.981 ms. Perhaps not so interesting given that it's older hardware, but good to note that the dual cores are not impacted by DDR3-1333 at all. I thought I had been memory bound before, but I was wrong. |
|
|
|
|
|
#666 |
|
Jun 2016
1 Posts |
My employer let me build a new workstation with a i7-6950X
![]() Specs
Code:
[Wed Jun 22 09:13:23 2016] Compare your results to other computers at http://www.mersenne.org/report_benchmarks Intel(R) Core(TM) i7-6950X CPU @ 3.00GHz CPU speed: 1864.58 MHz, 10 cores CPU features: Prefetchw, SSE, SSE2, SSE4, AVX, AVX2, FMA L1 cache size: 32 KB L2 cache size: 256 KB, L3 cache size: 25 MB L1 cache line size: 64 bytes L2 cache line size: 64 bytes TLBS: 64 Prime95 64-bit version 28.9, RdtscTiming=1 Best time for 1024K FFT length: 5.010 ms., avg: 5.024 ms. Best time for 1280K FFT length: 4.859 ms., avg: 4.911 ms. Best time for 1536K FFT length: 5.884 ms., avg: 5.920 ms. Best time for 1792K FFT length: 7.196 ms., avg: 7.249 ms. Best time for 2048K FFT length: 8.002 ms., avg: 8.054 ms. Best time for 2560K FFT length: 10.269 ms., avg: 10.512 ms. Best time for 3072K FFT length: 12.585 ms., avg: 12.743 ms. Best time for 3584K FFT length: 15.504 ms., avg: 15.831 ms. Best time for 4096K FFT length: 18.191 ms., avg: 18.324 ms. Best time for 5120K FFT length: 23.629 ms., avg: 24.516 ms. Best time for 6144K FFT length: 29.791 ms., avg: 30.115 ms. Best time for 7168K FFT length: 35.978 ms., avg: 36.637 ms. Best time for 8192K FFT length: 41.911 ms., avg: 48.057 ms. Timing FFTs using 2 threads. Best time for 1024K FFT length: 2.601 ms., avg: 2.809 ms. Best time for 1280K FFT length: 2.744 ms., avg: 2.945 ms. Best time for 1536K FFT length: 3.077 ms., avg: 3.157 ms. Best time for 1792K FFT length: 3.834 ms., avg: 4.026 ms. Best time for 2048K FFT length: 5.140 ms., avg: 5.173 ms. Best time for 2560K FFT length: 6.094 ms., avg: 6.924 ms. Best time for 3072K FFT length: 7.553 ms., avg: 8.138 ms. Best time for 3584K FFT length: 9.036 ms., avg: 9.947 ms. Best time for 4096K FFT length: 10.487 ms., avg: 11.013 ms. Best time for 5120K FFT length: 13.678 ms., avg: 14.730 ms. Best time for 6144K FFT length: 16.925 ms., avg: 17.189 ms. Best time for 7168K FFT length: 18.428 ms., avg: 19.795 ms. Best time for 8192K FFT length: 23.605 ms., avg: 24.299 ms. Timing FFTs using 3 threads. Best time for 1024K FFT length: 1.659 ms., avg: 1.742 ms. Best time for 1280K FFT length: 2.581 ms., avg: 2.834 ms. Best time for 1536K FFT length: 3.040 ms., avg: 3.487 ms. Best time for 1792K FFT length: 3.332 ms., avg: 3.802 ms. Best time for 2048K FFT length: 3.939 ms., avg: 4.596 ms. Best time for 2560K FFT length: 4.491 ms., avg: 5.520 ms. Best time for 3072K FFT length: 5.225 ms., avg: 6.449 ms. Best time for 3584K FFT length: 6.344 ms., avg: 7.177 ms. Best time for 4096K FFT length: 7.342 ms., avg: 8.583 ms. Best time for 5120K FFT length: 9.497 ms., avg: 10.556 ms. Best time for 6144K FFT length: 11.696 ms., avg: 12.928 ms. Best time for 7168K FFT length: 14.071 ms., avg: 15.242 ms. Best time for 8192K FFT length: 14.720 ms., avg: 15.956 ms. Timing FFTs using 4 threads. Best time for 1024K FFT length: 1.183 ms., avg: 1.262 ms. Best time for 1280K FFT length: 1.818 ms., avg: 2.043 ms. Best time for 1536K FFT length: 2.346 ms., avg: 2.580 ms. Best time for 1792K FFT length: 2.873 ms., avg: 3.277 ms. Best time for 2048K FFT length: 3.189 ms., avg: 3.702 ms. Best time for 2560K FFT length: 3.482 ms., avg: 4.478 ms. Best time for 3072K FFT length: 4.980 ms., avg: 6.118 ms. Best time for 3584K FFT length: 4.863 ms., avg: 5.857 ms. Best time for 4096K FFT length: 6.378 ms., avg: 7.342 ms. Best time for 5120K FFT length: 7.293 ms., avg: 8.274 ms. Best time for 6144K FFT length: 9.018 ms., avg: 10.252 ms. Best time for 7168K FFT length: 10.839 ms., avg: 12.003 ms. Best time for 8192K FFT length: 12.682 ms., avg: 13.563 ms. Timing FFTs using 5 threads. Best time for 1024K FFT length: 1.366 ms., avg: 1.476 ms. Best time for 1280K FFT length: 1.676 ms., avg: 1.832 ms. Best time for 1536K FFT length: 2.203 ms., avg: 2.537 ms. Best time for 1792K FFT length: 2.633 ms., avg: 3.061 ms. Best time for 2048K FFT length: 2.712 ms., avg: 3.206 ms. Best time for 2560K FFT length: 3.296 ms., avg: 3.991 ms. Best time for 3072K FFT length: 3.696 ms., avg: 4.553 ms. Best time for 3584K FFT length: 4.296 ms., avg: 5.309 ms. Best time for 4096K FFT length: 4.840 ms., avg: 5.910 ms. Best time for 5120K FFT length: 5.958 ms., avg: 7.189 ms. Best time for 6144K FFT length: 7.988 ms., avg: 8.980 ms. Best time for 7168K FFT length: 8.853 ms., avg: 10.024 ms. Best time for 8192K FFT length: 10.258 ms., avg: 11.330 ms. Timing FFTs using 6 threads. Best time for 1024K FFT length: 1.361 ms., avg: 1.501 ms. Best time for 1280K FFT length: 1.843 ms., avg: 1.988 ms. Best time for 1536K FFT length: 1.714 ms., avg: 1.874 ms. Best time for 1792K FFT length: 2.357 ms., avg: 2.700 ms. Best time for 2048K FFT length: 2.444 ms., avg: 2.887 ms. Best time for 2560K FFT length: 2.870 ms., avg: 3.454 ms. Best time for 3072K FFT length: 3.190 ms., avg: 4.154 ms. Best time for 3584K FFT length: 3.940 ms., avg: 4.873 ms. Best time for 4096K FFT length: 4.169 ms., avg: 5.256 ms. Best time for 5120K FFT length: 5.218 ms., avg: 6.543 ms. Best time for 6144K FFT length: 6.240 ms., avg: 7.421 ms. Best time for 7168K FFT length: 7.545 ms., avg: 8.910 ms. Best time for 8192K FFT length: 8.722 ms., avg: 9.837 ms. Timing FFTs using 7 threads. Best time for 1024K FFT length: 1.156 ms., avg: 1.308 ms. Best time for 1280K FFT length: 1.651 ms., avg: 1.858 ms. Best time for 1536K FFT length: 1.576 ms., avg: 1.824 ms. Best time for 1792K FFT length: 2.075 ms., avg: 2.591 ms. Best time for 2048K FFT length: 1.841 ms., avg: 2.460 ms. Best time for 2560K FFT length: 2.533 ms., avg: 3.028 ms. Best time for 3072K FFT length: 2.907 ms., avg: 3.624 ms. Best time for 3584K FFT length: 3.522 ms., avg: 4.378 ms. Best time for 4096K FFT length: 3.664 ms., avg: 4.673 ms. Best time for 5120K FFT length: 4.722 ms., avg: 5.834 ms. Best time for 6144K FFT length: 5.536 ms., avg: 6.690 ms. Best time for 7168K FFT length: 6.521 ms., avg: 7.729 ms. Best time for 8192K FFT length: 7.587 ms., avg: 9.024 ms. Timing FFTs using 8 threads. Best time for 1024K FFT length: 1.048 ms., avg: 1.120 ms. Best time for 1280K FFT length: 1.345 ms., avg: 1.469 ms. Best time for 1536K FFT length: 1.618 ms., avg: 1.846 ms. Best time for 1792K FFT length: 1.929 ms., avg: 2.233 ms. Best time for 2048K FFT length: 2.006 ms., avg: 2.307 ms. Best time for 2560K FFT length: 2.455 ms., avg: 3.002 ms. Best time for 3072K FFT length: 2.742 ms., avg: 3.401 ms. Best time for 3584K FFT length: 3.168 ms., avg: 3.952 ms. Best time for 4096K FFT length: 3.323 ms., avg: 4.260 ms. Best time for 5120K FFT length: 4.069 ms., avg: 5.349 ms. Best time for 6144K FFT length: 4.934 ms., avg: 6.162 ms. Best time for 7168K FFT length: 5.784 ms., avg: 7.071 ms. Best time for 8192K FFT length: 6.815 ms., avg: 7.912 ms. Timing FFTs using 9 threads. Best time for 1024K FFT length: 0.806 ms., avg: 0.876 ms. Best time for 1280K FFT length: 1.271 ms., avg: 1.470 ms. Best time for 1536K FFT length: 1.476 ms., avg: 1.732 ms. Best time for 1792K FFT length: 1.839 ms., avg: 2.096 ms. Best time for 2048K FFT length: 1.833 ms., avg: 2.100 ms. Best time for 2560K FFT length: 2.383 ms., avg: 2.788 ms. Best time for 3072K FFT length: 2.541 ms., avg: 3.026 ms. Best time for 3584K FFT length: 3.094 ms., avg: 3.701 ms. Best time for 4096K FFT length: 3.232 ms., avg: 4.024 ms. Best time for 5120K FFT length: 3.966 ms., avg: 5.013 ms. Best time for 6144K FFT length: 4.804 ms., avg: 5.902 ms. Best time for 7168K FFT length: 5.403 ms., avg: 6.469 ms. Best time for 8192K FFT length: 6.427 ms., avg: 7.618 ms. Timing FFTs using 10 threads. Best time for 1024K FFT length: 0.956 ms., avg: 1.025 ms. Best time for 1280K FFT length: 1.244 ms., avg: 1.319 ms. Best time for 1536K FFT length: 1.426 ms., avg: 1.643 ms. Best time for 1792K FFT length: 1.554 ms., avg: 1.767 ms. Best time for 2048K FFT length: 1.801 ms., avg: 2.402 ms. Best time for 2560K FFT length: 2.081 ms., avg: 2.311 ms. Best time for 3072K FFT length: 2.406 ms., avg: 2.933 ms. Best time for 3584K FFT length: 2.542 ms., avg: 2.899 ms. Best time for 4096K FFT length: 3.061 ms., avg: 3.676 ms. Best time for 5120K FFT length: 3.612 ms., avg: 4.502 ms. Best time for 6144K FFT length: 4.422 ms., avg: 5.500 ms. Best time for 7168K FFT length: 5.148 ms., avg: 6.105 ms. Best time for 8192K FFT length: 6.113 ms., avg: 7.234 ms. Timings for 1024K FFT length (10 cpus, 10 workers): 6.57, 6.54, 6.70, 6.50, 6.75, 6.76, 6.75, 6.61, 6.61, 6.59 ms. Throughput: 1506.84 iter/sec. Timings for 1280K FFT length (10 cpus, 10 workers): 8.63, 8.31, 8.35, 8.34, 8.36, 8.65, 8.51, 8.30, 8.31, 8.35 ms. Throughput: 1189.23 iter/sec. Timings for 1536K FFT length (10 cpus, 10 workers): 9.95, 9.79, 9.96, 9.89, 9.66, 10.06, 10.01, 10.24, 9.76, 9.85 ms. Throughput: 1008.81 iter/sec. Timings for 1792K FFT length (10 cpus, 10 workers): 11.58, 11.51, 11.50, 12.07, 11.56, 11.75, 11.60, 11.51, 11.65, 11.53 ms. Throughput: 860.38 iter/sec. Timings for 2048K FFT length (10 cpus, 10 workers): 13.58, 13.28, 13.90, 13.26, 13.26, 13.89, 13.39, 13.17, 13.26, 13.39 ms. Throughput: 744.47 iter/sec. Timings for 2560K FFT length (10 cpus, 10 workers): 17.71, 16.41, 16.46, 16.45, 16.42, 16.98, 16.65, 16.40, 16.39, 16.45 ms. Throughput: 601.54 iter/sec. Timings for 3072K FFT length (10 cpus, 10 workers): 19.92, 19.62, 19.56, 19.66, 19.63, 21.97, 19.86, 20.04, 19.73, 19.67 ms. Throughput: 501.40 iter/sec. [Wed Jun 22 09:18:27 2016] Timings for 3584K FFT length (10 cpus, 10 workers): 23.27, 23.41, 23.01, 23.16, 23.10, 24.97, 23.76, 23.22, 22.94, 23.15 ms. Throughput: 427.62 iter/sec. Timings for 4096K FFT length (10 cpus, 10 workers): 26.31, 26.31, 26.21, 26.42, 26.62, 27.58, 28.83, 26.46, 26.21, 26.89 ms. Throughput: 373.65 iter/sec. Timings for 5120K FFT length (10 cpus, 10 workers): 33.71, 33.73, 33.31, 33.16, 33.56, 36.04, 34.43, 34.69, 33.84, 33.69 ms. Throughput: 294.13 iter/sec. Timings for 6144K FFT length (10 cpus, 10 workers): 41.91, 41.32, 41.41, 41.41, 41.45, 43.62, 42.29, 42.35, 41.66, 41.52 ms. Throughput: 238.75 iter/sec. Timings for 7168K FFT length (10 cpus, 10 workers): 48.99, 49.07, 48.48, 50.00, 48.35, 50.35, 49.62, 48.42, 49.54, 49.20 ms. Throughput: 203.28 iter/sec. Timings for 8192K FFT length (10 cpus, 10 workers): 56.28, 55.49, 57.82, 55.58, 55.99, 57.60, 56.53, 55.49, 55.58, 57.00 ms. Throughput: 177.54 iter/sec. |
|
|
|
|
|
#667 |
|
Serpentine Vermin Jar
Jul 2014
3,313 Posts |
Well, the new server arrived today, earlier than I expected.
I'm still installing the OS and what not but I had time to squeeze in a quick test. This is a dual 14-core E5-2690v4 and I'm comparing it to a dual 14-core E5-2697v3. They are very similar CPUs... both clocked at 2.6 GHz, both 14-core. And, as it turns out, both are using DDR4-2100. See, the server came with a certain amount of RAM (64GB) and I added another 192GB, which results in 2DPC on both CPUs. (16GB modules x a total of 16). Well... imagine my surprise when I looked at the lights-out instrumentation and saw it was only running at 2133 MHz even though it recognizes all of the DIMMs are 2400 MHz. Turns out, after digging in the quickspecs, that it only runs @ 2400 with 2DPC if you're using dual-ranked 16GB modules, and I have a bunch of single-ranked sticks. Oh well... that's what it came with also, so if I wanted dual ranks I would have had to get 4 more to replace the factory installed stuff (or spent more anyway on a custom order instead of the smart buy). Eh, it's kind of a nuisance but it's not really a big deal. And I got to compare the v3 and v4 CPUs even more directly, getting down to just what the CPU itself is bringing to the table. Turns out, quite a lot. ![]() I took the work from the v3 box and put it on the v4, started both up and took a screenshot side-by-side so you can see the difference. v4 on the left, v3 on the right. (Notice that Prime95 28.9 doesn't identify the new chip yet). Oh, I'll probably tweak the system and either remove the 2nd DIMM per channel or set it to mirroring mode (that may or may not run them at the equivalent of 1DPC ?) just to do another test @ 2400 to see how much more that helps. |
|
|
|
|
|
#668 |
|
Serpentine Vermin Jar
Jul 2014
3,313 Posts |
I didn't think simply setting the memory to spare/mirrored would make it run @ 2400 so I just removed the 2nd dimm per channel. Here's the same view but with the v4 CPU running DDR4-2400. Really not as dramatic a change.
|
|
|
|
|
|
#669 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
19·397 Posts |
|
|
|
|
|
|
#670 | |
|
Serpentine Vermin Jar
Jul 2014
331310 Posts |
Quote:
![]() Code:
[Thu Jun 23 03:09:04 2016] i: 00000000, EAX: 0000000D, EBX: 756E6547, ECX: 6C65746E, EDX: 49656E69 i: 00000001, EAX: 000406F1, EBX: 22200800, ECX: FEF87383, EDX: BFCBFBFF i: 00000002, EAX: 76036301, EBX: 00F0B5FF, ECX: 00000000, EDX: 00C30000 i: 00000003, EAX: 00000000, EBX: 00000000, ECX: 00000000, EDX: 00000000 i: 00000004, EAX: 3C004121, EBX: 01C0003F, ECX: 0000003F, EDX: 00000000 i: 00000004, EAX: 3C004122, EBX: 01C0003F, ECX: 0000003F, EDX: 00000000 i: 00000004, EAX: 3C004143, EBX: 01C0003F, ECX: 000001FF, EDX: 00000000 i: 00000004, EAX: 3C07C163, EBX: 04C0003F, ECX: 00006FFF, EDX: 00000002 i: 00000004, EAX: 3C004000, EBX: 00000000, ECX: 00000000, EDX: 00000000 i: 00000005, EAX: 00000000, EBX: 00000000, ECX: 00000000, EDX: 00000000 i: 00000006, EAX: 00000077, EBX: 00000002, ECX: 00000001, EDX: 00000000 i: 00000007, EAX: 00000000, EBX: 00002BB9, ECX: 00000000, EDX: 00000000 i: 00000007, EAX: 00000000, EBX: 00000000, ECX: 00000000, EDX: 00000000 i: 00000007, EAX: 00000000, EBX: 00000000, ECX: 00000000, EDX: 00000000 i: 00000007, EAX: 00000000, EBX: 00000000, ECX: 00000000, EDX: 00000000 i: 00000007, EAX: 00000000, EBX: 00000000, ECX: 00000000, EDX: 00000000 i: 00000008, EAX: 00000000, EBX: 00000000, ECX: 00000000, EDX: 00000000 i: 00000009, EAX: 00000000, EBX: 00000000, ECX: 00000000, EDX: 00000000 i: 0000000A, EAX: 00000000, EBX: 00000000, ECX: 00000000, EDX: 00000000 i: 0000000B, EAX: 00000001, EBX: 00000002, ECX: 00000100, EDX: 00000022 i: 0000000B, EAX: 00000005, EBX: 0000001C, ECX: 00000201, EDX: 00000022 i: 0000000B, EAX: 00000000, EBX: 00000000, ECX: 00000002, EDX: 00000022 i: 0000000B, EAX: 00000000, EBX: 00000000, ECX: 00000003, EDX: 00000022 i: 0000000B, EAX: 00000000, EBX: 00000000, ECX: 00000004, EDX: 00000022 i: 0000000C, EAX: 00000000, EBX: 00000000, ECX: 00000000, EDX: 00000000 i: 0000000D, EAX: 00000007, EBX: 00000340, ECX: 00000340, EDX: 00000000 i: 80000000, EAX: 80000008, EBX: 00000000, ECX: 00000000, EDX: 00000000 i: 80000001, EAX: 00000000, EBX: 00000000, ECX: 00000121, EDX: 28100800 i: 80000002, EAX: 65746E49, EBX: 2952286C, ECX: 6F655820, EDX: 2952286E i: 80000003, EAX: 55504320, EBX: 2D354520, ECX: 30393632, EDX: 20347620 i: 80000004, EAX: 2E322040, EBX: 48473036, ECX: 0000007A, EDX: 00000000 i: 80000005, EAX: 00000000, EBX: 00000000, ECX: 00000000, EDX: 00000000 i: 80000006, EAX: 00000000, EBX: 00000000, ECX: 01006040, EDX: 00000000 i: 80000007, EAX: 00000000, EBX: 00000000, ECX: 00000000, EDX: 00000100 i: 80000008, EAX: 0000302E, EBX: 00000000, ECX: 00000000, EDX: 00000000 |
|
|
|
|
|
|
#671 | |
|
Serpentine Vermin Jar
Jul 2014
3,313 Posts |
Quote:
I was a little puzzled by the output of the benchmark throughput though... it was saying that the max throughput was with 28 threads/4 workers (as opposed to the 28 threads/2 workers that I would have picked as optimal). Turns out I was right, not the benchmark. When I actually set it up that way, with 4 workers of 7 cores each, it did pretty bad when all 4 were running. If I stopped all but one worker, that remaining worker sure took off and did well as expected, so it's definitely a case of interference with multiple workers on the same CPU; not enough mem bandwidth to go around. And that was with "only" 2400K FFT sized work. So, I'm sticking with what I can demonstrate is the ideal working setup... 1 worker using all cores on each CPU. ![]() After an overnight burn-in, since it's a new machine and I want to torture test it, I'll see how it does with some 100M digit stuff by way of comparison, or maybe see how it does with M49 since I can look back at my confirmation run on that to see how long that took on the previous gen CPUs. |
|
|
|
|
|
|
#672 |
|
Aug 2002
17410 Posts |
Would you post the 1w/14t throughput please for the 2690V4?
|
|
|
|
|
|
#673 |
|
Serpentine Vermin Jar
Jul 2014
1100111100012 Posts |
Yeah, although I don't have any confidence in the benchmark timings.
Code:
FFTlen=4096K, Type=0, Arch=0, Pass1=4194304, Pass2=0, clm=0 (14 cpus, 1 worker): 2.22 ms. Throughput: 451.28 iter/sec. FFTlen=4096K, Type=0, Arch=0, Pass1=4194304, Pass2=0, clm=0 (14 cpus, 2 workers): 3.62, 3.59 ms. Throughput: 554.82 iter/sec. FFTlen=4096K, Type=0, Arch=0, Pass1=4194304, Pass2=0, clm=0 (14 cpus, 7 workers): 15.66, 15.37, 15.66, 18.62, 14.13, 14.49, 14.09 ms. Throughput: 457.31 iter/sec. FFTlen=4096K, Type=0, Arch=0, Pass1=4194304, Pass2=0, clm=0 (14 cpus, 14 workers): 28.98, 29.16, 29.45, 28.96, 28.89, 28.92, 29.07, 28.96, 29.27, 29.52, 28.93, 28.95, 29.20, 29.48 ms. Throughput: 480.73 iter/sec. FYI, I focused on just the 4M FFT size since that's around the current leading edge of first-time LL tests. Similarly, with all 28 cores (it's a dual CPU system after all), the benchmark has the same story to tell... works best with workers of 7 cores each, but again that was not the case in practice. Code:
FFTlen=4096K, Type=0, Arch=0, Pass1=4194304, Pass2=0, clm=0 (28 cpus, 1 worker): 2.42 ms. Throughput: 413.78 iter/sec. FFTlen=4096K, Type=0, Arch=0, Pass1=4194304, Pass2=0, clm=0 (28 cpus, 2 workers): 5.56, 5.44 ms. Throughput: 363.66 iter/sec. FFTlen=4096K, Type=0, Arch=0, Pass1=4194304, Pass2=0, clm=0 (28 cpus, 4 workers): 5.69, 5.26, 5.51, 6.02 ms. Throughput: 713.61 iter/sec. FFTlen=4096K, Type=0, Arch=0, Pass1=4194304, Pass2=0, clm=0 (28 cpus, 7 workers): 14.08, 15.61, 12.16, 15.31, 13.06, 16.80, 11.98 ms. Throughput: 502.18 iter/sec. FFTlen=4096K, Type=0, Arch=0, Pass1=4194304, Pass2=0, clm=0 (28 cpus, 14 workers): 28.38, 29.96, 27.59, 29.77, 19.85, 21.59, 20.19, 28.38, 27.04, 28.78, 29.24, 20.06, 21.79, 19.90 ms. Throughput: 571.84 iter/sec. FFTlen=4096K, Type=0, Arch=0, Pass1=4194304, Pass2=0, clm=0 (28 cpus, 28 workers): 43.21, 43.40, 42.99, 43.45, 44.02, 43.51, 43.15, 45.82, 45.41, 45.46, 45.58, 46.21, 45.32, 44.86, 43.86, 45.31, 44.95, 43.48, 44.06, 43.94, 44.85, 45.71, 45.13, 45.53, 45.68, 45.36, 45.33, 45.59 ms. Throughput: 626.90 iter/sec. |
|
|
|
|
|
#674 |
|
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
2×33×109 Posts |
It is worth noting that the second cpu didn't add that much compared with the first.
As I mentioned in the other thread the cpus are split into two NUMA nodes each. This could explain the 7 core behaviour. |
|
|
|
|
|
#675 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
19×397 Posts |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Perpetual "interesting video" thread... | Xyzzy | Lounge | 43 | 2021-07-17 00:00 |
| LLR benchmark thread | Oddball | Riesel Prime Search | 5 | 2010-08-02 00:11 |
| Perpetual I'm pi**ed off thread | rogue | Soap Box | 19 | 2009-10-28 19:17 |
| Perpetual autostereogram thread... | Xyzzy | Lounge | 10 | 2006-09-28 00:36 |
| Perpetual ECM factoring challenge thread... | Xyzzy | Factoring | 65 | 2005-09-05 08:16 |