![]() |
![]() |
#1 |
A Sunny Moo
Aug 2007
USA (GMT-5)
141518 Posts |
![]()
Hi all,
I've been puzzling over something the last few days and am hoping that maybe someone here with more intimate knowledge of the interactions between FFT computations and processor speed/cache size can shed some light on it. ![]() In trying to juggle the allocation of two of my computers to various subprojects at the No Prime Left Behind and Conjectures 'R Us projects, I swapped subprojects on the two computers for a week or so and compared the test timings I saw. The results surprised me, given the comparative capabilities of the two CPUs. The two machines are:
The two subprojects are:
The N970 is (very roughly) equivalent in speed to a slow Core 2 Quad, while the i5-2400 is roughly 2.5-3x faster than a Core 2 per-core between its better instruction throughput and AVX. For the CRUS base 6 project, an FFT size of 448K is selected by LLR. The NPLB base 2 work uses an FFT size of 96K. Both of these are sufficiently small to fit within L2/L3 cache, respectively; the base 6 work just barely fits within the AMD's L2. Others at NPLB and CRUS have reported that, in general, Intel CPUs tend to do better than AMDs as FFT size increases. However, this is the exact opposite of what I'm seeing here! These are the test times I'm getting (approximately):
In relative terms, the AMD is ~2.5x worse than the Intel on base 6 (the larger FFT), but ~4.38x worse than the Intel on base 2 (the smaller FFT). Does anyone know why this might be happening? Again, it runs completely counter to the conventional wisdom on how AMDs and Intels perform as FFT sizes increase. Indeed, others at NPLB/CRUS have reported results in line with the conventional wisdom, i.e. AMD K8 processors performing increasingly badly w.r.t. Intel Core 2s as FFT increased. I am quite thoroughly confused. ![]() Are gwnum's non-base-2 FFTs not quite as heavily optimized for AVX by chance? (I'm pretty sure both bases are using AVX FFTs on the Intel. I don't have physical access to it so I can't tell you for sure, but I have another Sandy Bridge box with me and it's using AVX for the base 6 tests.) Max Last fiddled with by mdettweiler on 2014-07-27 at 20:40 |
![]() |
![]() |
![]() |
#2 |
P90 years forever!
Aug 2002
Yeehaw, FL
162668 Posts |
![]()
A 448K FFT uses 448K * 8 bytes plus sin/cos and weighting data -- around 4MB.
|
![]() |
![]() |
![]() |
#3 |
Jun 2003
12F816 Posts |
![]()
It is well known that AVX processors are very dependent on memory bandwidth (especially for larger FFTs). My 3GHz Ivy bridge can do a 448K FFT of similar bit length (SR5) in just over half the time (about 13500s).
My best guess is that your memory is not running in dual-channel mode. Or it is just very slow. What is your memory spec/configuration? Last fiddled with by axn on 2014-07-28 at 05:56 |
![]() |
![]() |
![]() |
#4 |
A Sunny Moo
Aug 2007
USA (GMT-5)
3×2,083 Posts |
![]()
Ah, that would do it - I happen to know for a fact that the i5-2400 is not running in dual channel mode. (When I built it I had to pick up the RAM last-minute at a local store, and all I could get my hands on at the time was 1 4GB stick.)
The AMD, by contrast, has 8 GB of 665 MHz memory - not very fast, but it is running in dual channel mode (2x4 GB), as confirmed by CPU-Z. That makes a whole lot of sense - thanks! I'll have to look into putting a second module in that i5... Also, thanks George for the tidbit on FFT sizes - I forgot the need to multiply by 8. In that case, then, seems that both machines are operating out of cache for the 448K FFT, which would explain the memory bandwidth issues. The 96K FFT, by contrast, is well within the Intel's 6 MB L3 cache, but outside the AMD's 512 KBx4 L2 cache, which is why the Intel does so much better there. Last fiddled with by mdettweiler on 2014-07-28 at 16:38 |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
P-1 factoring: B1 and B2 vs. multicore scaling | TheJudger | Software | 1 | 2016-05-02 21:09 |
Skylake and RAM scaling | mackerel | Hardware | 34 | 2016-03-03 19:14 |
Core2 X6800 Test Times | PrimeCrazzy | Hardware | 9 | 2006-08-29 08:34 |
strange problem with torture test on 16core machines | TheJudger | Hardware | 5 | 2006-04-08 11:20 |
Running a LL test on 2 different machines | lycorn | Software | 10 | 2003-01-13 19:34 |