![]() |
6-core Sandy Bridge test case
In case anyone is interested, I just measured the differences in iteration times running Prime95 with five workers, versus five workers plus one helper thread on the last worker.
The affinity scramble is configured so that each of the Prime95 main or helper workers lands on its own physical core. What is special about this case is that the computer has a continuously running OpenCL job in the form of a Radeon HD 7790 loaded to almost 100%. The OpenCL does appear to steal quite a few CPU cycles (though in theory it does not have to). Exponent range: current first time LL front, 60M or so. Here is the data (iteration times in seconds): [code] FFT | 5 workers | 5+1 workers ---------------------------------- 3360K | 0.027 | 0.030 3M | 0.024 | 0.027 3200K | 0.025 | 0.028 3200K | 0.025 | 0.028 3456K | 0.025 | 0.016 --------------------------------- Average| 0.0258 | 0.0258 [/code] Intel Core i7-3930K @ 3.20GHz, Quad Channel Memory (16 GB), Windows64, v27.9, build 1 Nothing is overclocked, though Turbo Boost is on, memory is on Intel standard timing (XMP is off). The goal of this test was to ensure that giving another core to Prime95 would not slow down overall LL testing progress. As you can see, the average per worker iteration time is exactly the same in both configurations. Conclusion: for this machine, do not use the sixth core for a Prime95 helper thread. |
What speed is your memory? Even with quad channel it is quite possible that 6 cores(possibly 4-5) could max out you memory bandwidth.
|
[QUOTE=henryzz;359947]What speed is your memory? Even with quad channel it is quite possible that 6 cores(possibly 4-5) could max out you memory bandwidth.[/QUOTE]
Last September this computer had originally passed 36 hours of Memtest86 at XMP setting of 1600 MT/s, 9-9-9-24. A year later, Prime95 started throwing errors and going back to previous save files. I checked the memory – at least two sticks became faulty. I replaced all memory with a brand new set, and, to be on the safe side, I will not be overclocking memory on this computer anymore. The current memory speed is 1333 MT/s, 9-9-9-24. |
[QUOTE=TObject;359942]As you can see, the average per worker iteration time is exactly the same in both configurations. Conclusion: for this machine, do not use the sixth core for a Prime95 helper thread.[/QUOTE]
First off, the average from first column is 0.0252, not 0.0258. However, raw average is not the correct way to do this -- you should be taking harmonic mean. You should calculate the number of iterations per second (1/t), take their average and invert it. When you do that, we end up with average iteration times of 0.0252 and 0.0245 respectively. That makes the second configuration the better one. |
There are a few 6-core Sandy Bridge results on the [URL="http://mersenneforum.org/showthread.php?t=59&page=55"]Perpetual benchmark thread[/URL] ([URL="http://mersenneforum.org/showpost.php?p=351596&postcount=598"]3930K[/URL], [URL="http://mersenneforum.org/showpost.php?p=316257&postcount=568"]3960X[/URL], [URL="http://mersenneforum.org/showpost.php?p=282437&postcount=559"]3930K[/URL]). Most of them seem to show a faster time with 6 CPUs vs. 5. This may be different than what you're looking at however, and are almost certainly on idle computers.
|
Somebody changed the numbers in my original post. Very funny.
|
Make sure your RAM is in fact in quad channel. Based on the fact that you know what memtest86+ is and that you know some about frequencies and timings you probably DO know.
If you want, you could clock your cores down 500 - 1000 MHz and re-do the test. If the iteration times go up, you're CPU limited. If they don't, you're RAM limited. Quad channel at 1333 MHz is going to be pushing it. I've got an i5-3570k @ 4.6 GHz and dual channel 2400 MHz 10-12-12-31 RAM and iteration times go up 20% with the fourth core running. Heavy memory bottleneck. You've got effectively 2666 MHz of dual channel RAM on six 3.4 (?) GHz cores, so you're roughly in the same ballpark as me. 10% faster memory bandwidth, 50% more cores but I have 33% higher frequency. |
Thanks for your help, everybody. My original conclusion was wrong. Here is the fixed table
[code] FFT | 5 workers | 5+1 workers ---------------------------------- 3360K | 0.027 | 0.030 3M | 0.024 | 0.027 3200K | 0.025 | 0.028 3200K | 0.025 | 0.028 3456K | 0.028 | 0.016 --------------------------------- Average| 0.0258 | 0.0258 GEOMEAN| 0.0258 | 0.0252 [b]HARMEAN[/b]|[b] 0.0257 [/b]|[b] 0.0245[/b] [/code] Iteration times in seconds. |
[QUOTE=TObject;359942]
What is special about this case is that the computer has a continuously running OpenCL job in the form of a Radeon HD 7790 loaded to almost 100%. The OpenCL does appear to steal quite a few CPU cycles (though in theory it does not have to).[/QUOTE] Just curious, what driver do you have for it? |
[QUOTE=kracker;360019]Just curious, what driver do you have for it?[/QUOTE]
13-4 |
[QUOTE=TObject;360020]13-4[/QUOTE]
Ah. Above 13-1, there is a cpu "bug". Flooding mfakto out(low priority, etc) will not affect output. |
| All times are UTC. The time now is 21:19. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.