![]() |
[QUOTE=pepi37;346474]And what will be performance increase with AVX2? ( if it will be any)[/QUOTE]
AVX2 is all about integer operations. Prime95 FFTs use floating point operations. I could double the TF speed using AVX2 - but one should really be using a Haswell system for LL instead. |
[QUOTE=Prime95;346487]AVX2 is all about integer operations.[/QUOTE]
Aside from the FMA3 support, you mean? ;) [I know, strictly speaking these are separate additions to the ISA, but since they appeared together in the same chip release I usually think of both the the 256-bit-wide SIMD-ints and the FMA3 as "AVX2".] Of course for AMD, FMA support is "SSE5", and it's FMA4, not FMA3. Clear as mud. |
and there are separate CPUID flags for FMA and AVX2. Yes, clear as mud.
|
[QUOTE=Prime95;346462]Notice the temp increase on the small FFT torture test! These small FFTs operate out of the L2 cache. I'm going to try a really small torture test that operates out of the L1 cache to see if I can get the temps even higher.[/QUOTE]
:loco: Wouldn't be much simpler to disconnect your fans? |
But shouldn't there be a massive slowdown on four workers if the memory is bottlenecked even at two workers?
Like I said in my previous post, my test had 11, 11, 11, and then finally 14 milliseconds once the fourth worker was added. In George's test, they are getting progressively longer, yes, but the bottleneck begins right at two workers. |
@George,
Was the system stable enough @4.2 GHz to be considered fit for day-to-day crunching? (day-to-day meaning 24/7 LLtesting) And did you take any power consumption measurement at that speed? Thx |
[QUOTE=lycorn;346520]Was the system stable enough @4.2 GHz to be considered fit for day-to-day crunching? And did you take any power consumption measurement at that speed?[/QUOTE]
Yes, I successfully completed about 10 doublechecks at that speed before switching to first time tests (version 27.9). I have not taken any power consumption measurements. |
[QUOTE=TheMawn;346498]But shouldn't there be a massive slowdown on four workers if the memory is bottlenecked even at two workers?[/QUOTE]
Let's do a "what if". What if each iteration did no FPU operations? That is, all it did was read and write memory. One worker would take about 5ms. Now when you start the second worker, *every* time it wants to read or write from memory it must wait. Each worker will now take 10ms. Third worker 15ms. Fourth worker 20ms. Prime95 isn't as bad as this what if case. Instead of worker two waiting *every* time it accesses memory, it only has to wait say 5 percent of the time -- a partial slowdown. The key here is the faster the CPU portion of prime95, the more penalty you'll see as you add another worker. |
i7-4960 tested
[url]http://www.tomshardware.com/reviews/core-i7-4960x-ivy-bridge-e-benchmark,3557-5.html[/url] |
I created a torture test that uses really small FFTs that fit in the L1 data cache. Alas, it runs cooler than the FFTs that fit in the L2 data cache.
|
L2 caches are so much larger that the overall die heating is probably more when one has all levels of the on-chip memory hierarchy busy.
|
| All times are UTC. The time now is 22:20. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.