![]() |
I'm seeing similar results. My i5-2500K has dropped from .032 to .022 per iteration on 1 core, the P-1 Stage 2 dropped from 450 sec to 355 sec, also on 1 core, with 2 cores of mfaktc running on the other 2 cores, with no OC.
Huge improvement, great job! Doug |
Here's what I'm seeing for iteration times on my i5-2500K running 2 instances
of Prime95 (on M26161123 and M26161217) and 2 instances of mfaktc. (The chip is overclocked to 4.5 GHz.) 26.6 (64-bit) --> 12.4 ms 27.2 (32-bit) --> 9.8 ms 27.3 (32-bit) --> 9.6 ms 27.3 (64-bit) --> 9.1 ms Nice improvement!! |
[QUOTE=James Heinrich;289742]With hyperthreading disabled:1: 12.8ms
2: 13.2ms 3: 13.6ms 4: 14.8ms 5: 16.7ms 6: ~19ms (ranges from 18.3 to 21.1 in different workers)[/QUOTE] I would be intrigued to see, if it's not too awkward to run, what the speed-as-you-add-more-workers progression is for v26.6.3. |
[QUOTE=fivemack;289797]I would be intrigued to see, if it's not too awkward to run, what the speed-as-you-add-more-workers progression is for v26.6.3.[/QUOTE]v26.6.3 vs v27.1.3 (both 64-bit):
1: 21.1ms vs 12.8ms (65% faster) 2: 21.2ms vs 13.2ms (61% faster) 3: 21.4ms vs 13.6ms (57% faster) 4: 21.5ms vs 14.8ms (45% faster) 5: 22.0ms vs 16.7ms (32% faster) 6: 22.6ms vs 19.0ms (19% faster) [b]edit:[/b] just realized I ran the 27.1.3 with Hyperthreading disabled, and v26.6.3 with it enabled :sad: Will re-run benchmarks later. |
These numbers are a bit confusing to analyse because everyone seems to be running their machines at different clock speeds and memory speeds.
I appreciate that it involves multiple reboots and makes running other jobs difficult while you're doing it, but I think that a conclusive analysis of the effect of memory bandwidth really would benefit from benchmarks from 27.3 at two CPU multipliers as far apart as possible, with memory speed kept the same, and turbo and hyperthreading turned off in both cases. (really ideally would also be data points at two different memory speeds with CPU multiplier kept the same, but I don't know if X79 BIOSes allow you to set that conveniently) The idea's to solve for runtime as A + B/cpuspeed + C/memoryspeed and see if anything interesting shows up in the values of A, B and C. I've done this analysis with the SPEC99 benchmarks to divide them into CPU-intensive and memory-intensive ones. |
Linux executables should be available. Untested. Sometimes Primenet doesn't recognize new versions, but I have to do some evening entertaining right now.
P.S. This is the second time Ubuntu has toasted the root disk. Arcane fsck command restored it both times. I don't know how a novice user would ever recover... |
[QUOTE=fivemack;289802]I think that a conclusive analysis of the effect of memory bandwidth really would benefit from benchmarks from 27.3 at two CPU multipliers as far apart as possible, with memory speed kept the same, and turbo and hyperthreading turned off in both cases. (really ideally would also be data points at two different memory speeds with CPU multiplier kept the same, but I don't know if X79 BIOSes allow you to set that conveniently)[/QUOTE]I'll see if I can get you this data tomorrow.
|
[QUOTE=fivemack;289723]Maybe I'm misunderstanding the request, but I think the question is whether there's a slowdown running six one-thread workers on six different jobs[/QUOTE]
You are right, the memory traffic would be higher in that case. There is no slowdown for me when running 4 different workers, doing 4 different jobs on 4 physical cores (alone, or used as 8 logical, with helpers) compared with 2.72, I would say it is a copper faster. And much faster than 2.65/66. |
i5-2500K, stock speed , windows 7 home premium
[code] Compare your results to other computers at http://www.mersenne.org/report_benchmarks Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz CPU speed: 3336.07 MHz, 4 cores CPU features: Prefetch, MMX, SSE, SSE2, SSE4, AVX L1 cache size: 32 KB L2 cache size: 256 KB, L3 cache size: 6 MB L1 cache line size: 64 bytes L2 cache line size: 64 bytes TLBS: 64 Prime95 64-bit version 27.3, RdtscTiming=1 Best time for 768K FFT length: 4.720 ms., avg: 4.901 ms. Best time for 896K FFT length: 5.770 ms., avg: 5.987 ms. Best time for 1024K FFT length: 6.469 ms., avg: 6.606 ms. Best time for 1280K FFT length: 8.261 ms., avg: 8.452 ms. Best time for 1536K FFT length: 10.157 ms., avg: 10.394 ms. Best time for 1792K FFT length: 12.174 ms., avg: 12.454 ms. Best time for 2048K FFT length: 13.578 ms., avg: 13.884 ms. Best time for 2560K FFT length: 17.210 ms., avg: 17.559 ms. Best time for 3072K FFT length: 21.431 ms., avg: 21.703 ms. Best time for 3584K FFT length: 25.954 ms., avg: 26.447 ms. Best time for 4096K FFT length: 29.306 ms., avg: 29.481 ms. Best time for 5120K FFT length: 37.961 ms., avg: 38.296 ms. Best time for 6144K FFT length: 45.709 ms., avg: 47.362 ms. Best time for 7168K FFT length: 55.606 ms., avg: 55.943 ms. Best time for 8192K FFT length: 63.926 ms., avg: 64.368 ms. Timing FFTs using 2 threads. Best time for 768K FFT length: 2.611 ms., avg: 2.691 ms. Best time for 896K FFT length: 3.123 ms., avg: 3.200 ms. Best time for 1024K FFT length: 3.512 ms., avg: 3.643 ms. Best time for 1280K FFT length: 4.557 ms., avg: 4.876 ms. Best time for 1536K FFT length: 5.513 ms., avg: 5.684 ms. Best time for 1792K FFT length: 6.637 ms., avg: 6.861 ms. Best time for 2048K FFT length: 7.410 ms., avg: 7.569 ms. Best time for 2560K FFT length: 9.301 ms., avg: 9.652 ms. Best time for 3072K FFT length: 11.599 ms., avg: 11.857 ms. Best time for 3584K FFT length: 14.025 ms., avg: 14.427 ms. Best time for 4096K FFT length: 15.921 ms., avg: 16.137 ms. Best time for 5120K FFT length: 20.980 ms., avg: 23.371 ms. Best time for 6144K FFT length: 24.257 ms., avg: 24.651 ms. Best time for 7168K FFT length: 29.402 ms., avg: 29.815 ms. Best time for 8192K FFT length: 34.335 ms., avg: 34.642 ms. Timing FFTs using 3 threads. Best time for 768K FFT length: 1.838 ms., avg: 1.916 ms. Best time for 896K FFT length: 2.223 ms., avg: 2.361 ms. Best time for 1024K FFT length: 2.547 ms., avg: 3.664 ms. Best time for 1280K FFT length: 3.276 ms., avg: 3.400 ms. Best time for 1536K FFT length: 4.017 ms., avg: 4.120 ms. Best time for 1792K FFT length: 4.768 ms., avg: 4.914 ms. Best time for 2048K FFT length: 5.392 ms., avg: 5.509 ms. Best time for 2560K FFT length: 6.934 ms., avg: 7.173 ms. Best time for 3072K FFT length: 8.511 ms., avg: 8.696 ms. Best time for 3584K FFT length: 10.403 ms., avg: 10.938 ms. Best time for 4096K FFT length: 11.650 ms., avg: 12.018 ms. Best time for 5120K FFT length: 14.835 ms., avg: 15.071 ms. Best time for 6144K FFT length: 17.789 ms., avg: 18.049 ms. Best time for 7168K FFT length: 21.164 ms., avg: 21.397 ms. Best time for 8192K FFT length: 24.641 ms., avg: 25.339 ms. Timing FFTs using 4 threads. Best time for 768K FFT length: 1.670 ms., avg: 1.723 ms. Best time for 896K FFT length: 2.032 ms., avg: 2.079 ms. Best time for 1024K FFT length: 2.305 ms., avg: 2.403 ms. Best time for 1280K FFT length: 2.997 ms., avg: 3.066 ms. Best time for 1536K FFT length: 3.637 ms., avg: 5.094 ms. Best time for 1792K FFT length: 4.386 ms., avg: 4.509 ms. Best time for 2048K FFT length: 4.895 ms., avg: 7.237 ms. Best time for 2560K FFT length: 6.309 ms., avg: 6.483 ms. Best time for 3072K FFT length: 7.560 ms., avg: 7.742 ms. Best time for 3584K FFT length: 9.366 ms., avg: 9.645 ms. Best time for 4096K FFT length: 10.515 ms., avg: 11.590 ms. Best time for 5120K FFT length: 13.031 ms., avg: 13.184 ms. [Fri Feb 17 22:29:50 2012] Best time for 6144K FFT length: 15.449 ms., avg: 15.707 ms. Best time for 7168K FFT length: 18.263 ms., avg: 18.686 ms. Best time for 8192K FFT length: 21.290 ms., avg: 21.571 ms. Best time for 61 bit trial factors: 2.294 ms. Best time for 62 bit trial factors: 2.309 ms. Best time for 63 bit trial factors: 2.607 ms. Best time for 64 bit trial factors: 2.698 ms. Best time for 65 bit trial factors: 3.169 ms. Best time for 66 bit trial factors: 3.740 ms. Best time for 67 bit trial factors: 3.709 ms. Best time for 75 bit trial factors: 3.614 ms. Best time for 76 bit trial factors: 3.596 ms. Best time for 77 bit trial factors: 3.633 ms. [/code] tldr : more than 3 core on one task is useless |
[QUOTE=firejuggler;289840]i5-2500K, stock speed , windows 7 home premium[/QUOTE]
Useful data - what's the memory speed here? (I think you can see something like a memory-bandwidth effect by comparing this to the 4429/2133 i5/2500K data and seeing that the speed ratio goes down as the number of threads go up - the 4429/2133 is 48% faster at 4 threads and only 34% faster at 1 thread - but that would imply that the memory on the i5/2500K is 1333MHz, so if it isn't I'll have to revise my analysis) |
Yes, memory speed is 1333Mhz.
|
| All times are UTC. The time now is 17:50. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.