![]() |
cannot torture test with this version.... amd fx-8120
|
mprime v27.2 :tu:
[CODE]PrimeNet success code with additional info: LL test successfully completes double-check of M26378767 [/CODE] |
Thanks for the hard work
George,
Thanks for the hard work on getting the AVX working. From my brief review of Intel's documenatation the AVX instructions are a whole new beast compared to the SSE even though the concepts are similar and there is a lot of hardware overlap in the implementation (xmm registers are mapped on to the first 128 bits of the 256 bit ymm registers, etc). So I am sure that the port was not "trivial". Hopefully, when Intel extends AVX to 512 bits and then 1024 bits (which they claim that AVX is designed to have that extensibility) the port will be easier. Of course all that is many years away. Sorry to hear about the Windows 7 woes. Not surprised though. It seems to me that Microsoft's solution to OS problems is often to wipe the disk and start with a fresh OS. I was wondering about your thoughts on the performance improvements seen so far with AVX. In theory, AVX should double the performance over SSE; however, Prime96 and mprime are seeing about a 25% improvement. (Every little bit helps of course so the work was worth it). However, given Intel's hype regarding AVX that seems a little underwhelming. I haven't seen a whole lot of AVX vs. SSE performance comparisons for other applications, but I am wondering if the results so far for Prime95 are typical. Do you know? One thought as to why this is occuring is memory bandwith - that the current Sandy Bridge implementation just needs more time to get data into and out of the ymm registers. Perhaps the 64-bit optimizations will help as you will have twice as many ymm registers to work with. Also, maybe Ivy Bridge will isolate and remove any AVX related bottlenecks in the Sandy Bridge implementation and applications will see better improvements of AVX over SSE in that future implentation. Just a few of my somewhat uneducated musings regarding AVX. I know you are busy with your Windows 7 woes and the 64-bit optimizations; so, no need to respond right away (or at all). Thanks again for all the hard work with GIMPS. I have been following the progress almost since the beginning, but have only been working (actually letting my computers work) on GIMPS for the last year and a half or so. Forrest |
[QUOTE=Forrest Lumpkin;283653] In theory, AVX should double the performance over SSE; however, Prime96 and mprime are seeing about a 25% improvement. (Every little bit helps of course so the work was worth it). However, given Intel's hype regarding AVX that seems a little underwhelming. I haven't seen a whole lot of AVX vs. SSE performance comparisons for other applications, but I am wondering if the results so far for Prime95 are typical. Do you know? One thought as to why this is occuring is memory bandwith - that the current Sandy Bridge implementation just needs more time to get data into and out of the ymm registers. [/QUOTE]
When you compare identically clocked Core 2 and Sandy Bridge systems, the improvement is more in line with expectations. Sandy Bridge has twice the memory bandwidth as well as twice the FPU throughput when using AVX. Prime95 v26 on Sandy Bridge already got a very nice speed bump thanks to the doubled memory bandwidth. Anyone care to do the comparison of 32-bit v26 prime95 on Core 2 vs. 32-bit v27 prime95 Sandy Bridge? Are we close to double the speed? I suppose the CPU benchmarks web page would have all the needed info. I have noticed one thing about Sandy Bridge. It seems to have a harder time hiding the latency when retrieving data from the L2 cache. It means I have to work harder at doing the maximum amount of work while data is in the L1 cache. |
1 Attachment(s)
[QUOTE=Prime95;283657]Anyone care to do the comparison of 32-bit v26 prime95 on Core 2 vs. 32-bit v27 prime95 Sandy Bridge? Are we close to double the speed? I suppose the CPU benchmarks web page would have all the needed info.[/QUOTE]I found one pair of suitable comparison benchmarks, and scaled performance is close to double (1.6x-1.95x).
edit: If memory bandwidth is an issue, the i7-3930K obviously has an unfair advantage with quad-channel vs dual-channel. |
[QUOTE=James Heinrich;283660]If memory bandwidth is an issue, the i7-3930K obviously has an unfair advantage with quad-channel vs dual-channel.[/QUOTE]
True, but the bottleneck could be L3-to-L2 or L2-to-L1 bandwidth. |
[QUOTE=Dubslow;283429]9/9 + 4/4 half-AVX + 1 P-1 factor[/QUOTE]
12/12 + 4/4 half-AVX + 1 P-1 factor Results posted in thread. AVX code appears to be excellent. |
Sandy Bridge Performance
George,
Thanks for the information. I now understand something that has puzzled me a little over the past few months. The memory bandwith increase on Sandy Bridge explains why v26 performs so well on my Sandy Bridge iMac when compared to v26 on a Westmere that I also run on! Thanks again for all the hard work. Forrest |
[QUOTE=Prime95;281148]A [B]very early pre-beta[/B] prime95 version 27.2 is available. It has support for most, but not all AVX FFT lengths. I have not done any of the 64-bit optimizations.
... I'd be happy to hear of bug reports. I'd be curious about Bulldozer benchmarks. A few double-check LL tests would improve my confidence in the FFT code. Next up for me is 64-bit optimizations. These things take time. Please be patient.[/QUOTE] George, I'll be able to test on Bulldozer when the 64 bit Windows version is ready. I know you're busy, so I'm running P-1s on it right now. I'll move one of the workers to DC once installed. Jerry |
Can you run both this and 26.6 at the same time? I'd like to be able to use the current version so I can access the 10GB I have set for P-1's and the beta for the speedup on LL's.
|
[QUOTE=bcp19;284123]Can you run both this and 26.6 at the same time? [/QUOTE]
Sure. Use different folders. Make sure affinities are set so that the two programs aren't fighting for the same cores. |
| All times are UTC. The time now is 23:29. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.