mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2014-07-27, 20:38   #1
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

141518 Posts
Default Odd scaling of test times between two machines

Hi all,

I've been puzzling over something the last few days and am hoping that maybe someone here with more intimate knowledge of the interactions between FFT computations and processor speed/cache size can shed some light on it.

In trying to juggle the allocation of two of my computers to various subprojects at the No Prime Left Behind and Conjectures 'R Us projects, I swapped subprojects on the two computers for a week or so and compared the test timings I saw. The results surprised me, given the comparative capabilities of the two CPUs.

The two machines are:
  • AMD Phenom II X4 N970 (Caspian) - 512 KB x 4 L2 cache, 2.2 GHz (but with some thermal throttling, it's a laptop that I haven't cleaned in a while)
  • Intel Core i5-2400 (Sandy Bridge) - 6 MB L3 cache, 3.1 GHz (max turbo 3.4 GHz, likely not throttling much if at all)

The two subprojects are:
  • NPLB 14th Drive, LLR primality tests of k*2^n-1, range k=600-1001, n=~1.3M
  • CRUS, PRP/N-1 primality tests (using LLR) of k*6^n-1, k=1597 and 36772, n=~1.68M - roughly equivalent to n=~4.35M base 2 in terms of decimal length

The N970 is (very roughly) equivalent in speed to a slow Core 2 Quad, while the i5-2400 is roughly 2.5-3x faster than a Core 2 per-core between its better instruction throughput and AVX.

For the CRUS base 6 project, an FFT size of 448K is selected by LLR. The NPLB base 2 work uses an FFT size of 96K. Both of these are sufficiently small to fit within L2/L3 cache, respectively; the base 6 work just barely fits within the AMD's L2.

Others at NPLB and CRUS have reported that, in general, Intel CPUs tend to do better than AMDs as FFT size increases. However, this is the exact opposite of what I'm seeing here! These are the test times I'm getting (approximately):
  • 61,000 seconds/test for CRUS base 6 on the AMD
  • 24,500 seconds/test for CRUS base 6 on the Intel
  • 3,500 seconds/test for NPLB base 2 on the AMD
  • 800 seconds/test for NPLB base 2 on the Intel

In relative terms, the AMD is ~2.5x worse than the Intel on base 6 (the larger FFT), but ~4.38x worse than the Intel on base 2 (the smaller FFT).

Does anyone know why this might be happening? Again, it runs completely counter to the conventional wisdom on how AMDs and Intels perform as FFT sizes increase. Indeed, others at NPLB/CRUS have reported results in line with the conventional wisdom, i.e. AMD K8 processors performing increasingly badly w.r.t. Intel Core 2s as FFT increased. I am quite thoroughly confused.

Are gwnum's non-base-2 FFTs not quite as heavily optimized for AVX by chance? (I'm pretty sure both bases are using AVX FFTs on the Intel. I don't have physical access to it so I can't tell you for sure, but I have another Sandy Bridge box with me and it's using AVX for the base 6 tests.)

Max

Last fiddled with by mdettweiler on 2014-07-27 at 20:40
mdettweiler is offline   Reply With Quote
Old 2014-07-28, 03:49   #2
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

162668 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
For the CRUS base 6 project, an FFT size of 448K is selected by LLR. The NPLB base 2 work uses an FFT size of 96K. Both of these are sufficiently small to fit within L2/L3 cache, respectively; the base 6 work just barely fits within the AMD's L2.
A 448K FFT uses 448K * 8 bytes plus sin/cos and weighting data -- around 4MB.
Prime95 is offline   Reply With Quote
Old 2014-07-28, 05:55   #3
axn
 
axn's Avatar
 
Jun 2003

12F816 Posts
Default

It is well known that AVX processors are very dependent on memory bandwidth (especially for larger FFTs). My 3GHz Ivy bridge can do a 448K FFT of similar bit length (SR5) in just over half the time (about 13500s).

My best guess is that your memory is not running in dual-channel mode. Or it is just very slow. What is your memory spec/configuration?

Last fiddled with by axn on 2014-07-28 at 05:56
axn is online now   Reply With Quote
Old 2014-07-28, 16:35   #4
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

Ah, that would do it - I happen to know for a fact that the i5-2400 is not running in dual channel mode. (When I built it I had to pick up the RAM last-minute at a local store, and all I could get my hands on at the time was 1 4GB stick.)

The AMD, by contrast, has 8 GB of 665 MHz memory - not very fast, but it is running in dual channel mode (2x4 GB), as confirmed by CPU-Z.

That makes a whole lot of sense - thanks! I'll have to look into putting a second module in that i5...

Also, thanks George for the tidbit on FFT sizes - I forgot the need to multiply by 8. In that case, then, seems that both machines are operating out of cache for the 448K FFT, which would explain the memory bandwidth issues. The 96K FFT, by contrast, is well within the Intel's 6 MB L3 cache, but outside the AMD's 512 KBx4 L2 cache, which is why the Intel does so much better there.

Last fiddled with by mdettweiler on 2014-07-28 at 16:38
mdettweiler is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
P-1 factoring: B1 and B2 vs. multicore scaling TheJudger Software 1 2016-05-02 21:09
Skylake and RAM scaling mackerel Hardware 34 2016-03-03 19:14
Core2 X6800 Test Times PrimeCrazzy Hardware 9 2006-08-29 08:34
strange problem with torture test on 16core machines TheJudger Hardware 5 2006-04-08 11:20
Running a LL test on 2 different machines lycorn Software 10 2003-01-13 19:34

All times are UTC. The time now is 04:56.

Fri Feb 26 04:56:06 UTC 2021 up 85 days, 1:07, 0 users, load averages: 1.83, 2.19, 2.46

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.