mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Any ideas on v26 FFT selection algorithm? (https://www.mersenneforum.org/showthread.php?t=13555)

Prime95 2010-06-22 18:36

Any ideas on v26 FFT selection algorithm?
 
I'm working on the code that selects which FFT implementation to run for a given FFT size. Kinda stuck.

V25 uses a hardwired selection process based on the L2+L3 cache size.

Preliminary benchmarks on my Core 2 and Core i7 boxes indicate that this isn't optimal.

I could add hardwired choices for Core i7 too, but as different cache configurations come out and next generation chips are released, even more hardwired choices might need to be added. So maybe a more dynamic system is needed, but that runs into other difficulties such as other programs might stop and start during a mini-benchmark to determine the best FFT implementation. Worse yet (and this applies to a hardwired choice), the best single-worker FFT implementation might not be the same as the best multi-worker FFT implementation.

At this point, I'm leaning to an expanded hardwired approach. Although I'm willing to entertain interesting ideas.

Prime95 2010-06-22 18:41

I could use a few more v26 benchmarks. My Core 2 machine is a Mac which does not give the most accurate timings. So a [B]few[/B] Core 2 and i7 benchmarks would be useful. I'd also be curious as to AMD K10 timings (though I haven't worked on any AMD optimizations yet).

A benchmark will take a few hours. Please have little to nothing else running.

Download (but don't overwrite your existing prime95!):

[url]ftp://mersenne.org/gimps/test_v26_32.zip[/url]
[url]ftp://mersenne.org/gimps/test_v26_64.zip[/url]

Install v26 prime95 in a new directory. Tell prime95 you are
a stress tester so that it does not contact the server.

Add this to prime.txt:

StressTester=1
MinBenchFFT=4
MaxBenchFFT=32768
OnlyBench5678=0
BenchAllComplex=1
AllBench=1
NumCPUs=1

Then run Options/Benchmark. Post your results.txt file in this thread.

P.S. I really wouldn't recommend using this version for production work, though it will probably work.

Prime95 2010-06-22 18:47

If you want to do some QA, then do this:

Add this to prime.txt:

[qa]
MAX_B=5
MIN_N=100000
MAX_N=10000000
MAX_K_BITS=20
MAX_C_BITS_FOR_SMALL_K=2
MAX_C_BITS_FOR_LARGE_K=2

Then do Advanced/Time. Enter 9920 as the exponent.

Report any occurrences of the word "mismatch" in results.txt. For extra credit, report any instances where the round off error exceeds 0.3.

enderak 2010-06-22 18:51

I can't seem to get the 64-bit version to download. Does not give an error, just sits there saying "starting" but never starts. 32-bit version downloads right away.

[EDIT] 32-bit doesn't want to download either now.

henryzz 2010-06-22 19:00

How about before it starts testing each size for the first time it runs a benchmark on the available FFT algorithms that look likely to be useful.

starrynte 2010-06-22 21:35

Do you need any i5 benchmarks / would they help?

Prime95 2010-06-22 22:13

[QUOTE=starrynte;219563]Do you need any i5 benchmarks / would they help?[/QUOTE]

I'd bet i5 benchmarks would mirror my Lynnfield i7. Now an i7-920 with three memory channels would be somewhat interesting

Prime95 2010-06-22 22:33

[QUOTE=enderak;219545]I can't seem to get the 64-bit version to download. Does not give an error, just sits there saying "starting" but never starts. 32-bit version downloads right away.
[/QUOTE]

I changed the URL to FTP. See if that helps.

TheJudger 2010-06-23 09:31

Hello George,

why not having a basic hardwired table for those users which just want to run Prime95 and an option for advanced users to run some benchmarks to select "optimal" parameters for their machine?

Oliver

P.S. I can't test Windows binaries :sad:

NBtarheel_33 2010-06-23 12:09

4.5 hours of QA
 
1 Attachment(s)
Here is the results.txt file from 4.5 hours of QA on Betsy, a 3.06 GHz PIV. There are NO mismatches that I can see, nor did MaxErr ever get above 0.3 (in fact I think the highest value was around 0.17-0.18).

I am going to run further QA on this system, as well as on the Core2 systems I have borged. But it looks good on the P4, initially!

Rhyled 2010-06-23 15:00

6hr QA on i7 920 3.73 GHz OC good MaxErr=0.28125
 
1 Attachment(s)
6 hours of running QA on my overclocked i7-920 came up with no mismatches and a MaxErr of 0.28125 (close to your 0.3 threshold). It would have been a longer test, but I got MS updated with a reboot at 3 am.

One thing I noticed while running QA is that the total cpu loading was only in the 25-50% (mostly at 27%) which is far lower than I'm used to seeing with Prime95.

System Spec: Core i7 920 @ 3.73 (overclocked). 6 GB DDR3 1424 MHz RAM in triple channel. No Hyperthreading

[ATTACH]5380[/ATTACH] had to zip it - txt file is 475KB


All times are UTC. The time now is 15:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.