Quote:
Originally posted by ewmayer
Just build a version with all your usual compiler flags and also with xcollect, then run one or more of the selftest sets, then incorporate the RTP data that were collected by doing a final build with xuse replacing xcollect. I believe Bill Rea got a nice (1020%) speedup at most FFT lengths this way. Note that the optimal FFT radix sets may change once profiling has been done.

Hello Ernst,
Okay, I've recompiled the code several times and have finally came up with the two versions I used for testing and timings. One is the regular compile, while the other is the runtimeprofiled version. I will post the two mlucas.cfg files which includes the FFT size, the fastest radix set index, and it's associated clocks.
The RTP version is typically faster, except once you get above the 4096K FFT size, then the profiled version is a little slower on most of them.
As you said, the radix index sets are different.