![]() |
|
|
#1 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
753210 Posts |
I want to add a feature to prime95 to select the best FFT implementation for the user's computer. The need for this became all-too-apparent in 28.9 when some reported the new version was slower than 28.6 and others found it faster.
My plan is to do a throughput benchmark and write the results to local.txt. When starting an LL test, prime95 looks at the benchmark data and selects the fastest FFT. The open issues where your ideas may be helpful: 1) The benchmark really wants to run when nothing else is going on. Even so, the OS may fire up some process that skews a run. So, I need a good algorithm that tracks multiple runs and throws out outlier data. 2) My first thought was to have a menu choice to run the throughput benchmark. I suspect it will not get run on many machines. Alternatively, I could run the benchmark when prime95 is launched or at a late night hour or both until we have enough runs we are confident in our throughput data. Ideas? How many runs before we are confident in the results? 3) Any ideas on how best to detect a significant change (CPU, memory, whatever?) and discard the accumulated data? Last fiddled with by Prime95 on 2016-04-26 at 20:34 |
|
|
|
|
|
#2 |
|
"/X\(‘-‘)/X\"
Jan 2013
2·5·293 Posts |
The optimal FFT size may also change depending on the load on the machine throughout the day.
How quickly can you run a benchmark and get an accurate result? Would it be possible to run a micro-benchmark every hour or so many iterations and adjust to current conditions? I'm not sure of any easy way to detect change in memory bandwidth, for instance, and so micro-benchmarks may be the way to go. |
|
|
|
|
|
#3 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
22·7·269 Posts |
I was going to use 20 second benchmarks running on all cores to determine a machine's throughput for each possible FFT implementation.
|
|
|
|
|
|
#4 |
|
"/X\(‘-‘)/X\"
Jan 2013
2×5×293 Posts |
Run hourly, that would be a 0.6% overhead. Would it be possible to pick the three or four most likely fastest implementations and benchmark just those?
|
|
|
|
|
|
#5 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
11101011011002 Posts |
I was thinking we'd run a handful and be done, or maybe a dozen if were not getting consistent results, or something. This is not something to be done on a continuing basis.
|
|
|
|
|
|
#6 | |
|
Jun 2003
22·33·47 Posts |
Quote:
However, that requires a fixed worker/thread configuration. If that also needs to be tuned simultaneously, then it is not feasible to do it. |
|
|
|
|
|
|
#7 |
|
Jun 2005
USA, IL
193 Posts |
Cam you temporarily increase the program priority?
|
|
|
|
|
|
#8 |
|
Aug 2002
2×3×29 Posts |
My suggestions
Last fiddled with by xtreme2k on 2016-04-27 at 11:10 |
|
|
|
|
|
#9 |
|
Dec 2014
3·5·17 Posts |
If we have 4 FFT algo to choose from, and we have been assigned an exponent of 70M can prime95 do this?
1. Run FFT 1 for iterations 1 to 1000 (for the 70M exponent) 2. Run FFT 2 for iterations 1001 to 2000 3. Run FFT 3 for iterations 2001 to 3000 4. (you get the idea) 5. Run FFT 1 again for 4001 to 5000 ... Run FFT (best) for iterations 20001 to 70M until prime95 gets enough samples to be able to pick between the 4 choices. If some runs for an FFT disagree with other runs, then some other process may have been active on the system during a run so we might need more samples. The advantage being all CPU time is used to finish real LL work. If wall clock elapsed time = CPU usage, then no other process has run to compete with prime95. (See Windows API GetProcessTimes.) If process page rate is low, then no other process is competing for memory. (See Windows API GetProcessMemoryInfo. Windows will take memory from a process even when the system has lots of free memory. When the page is needed again, it causes a "soft" page fault. A hard page fault requires a disk access. A soft page fault is when the page is already in memory but marked available. GetProcessMemoryInfo includes both soft and hard faults. So it will never show 0 page faults. The PDH library lets a process access perfmon data which has soft and hard faults separated.) |
|
|
|
|
|
#10 |
|
Feb 2012
34×5 Posts |
Can figure out what makes certain algorithms run faster on certain architectures, so that no benchmarking is necessary?
What are we up against in this case? Intel’s secrecy? |
|
|
|
|
|
#11 |
|
"/X\(‘-‘)/X\"
Jan 2013
2×5×293 Posts |
The problem is that different memory bandwidths, cache bandwidths, cache sizes, the exponent, and the number of cores running can affect what is the optimal FFT. There are too many variables.
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Prime95 - stop all workers on error [feature request] | kql | Software | 1 | 2020-12-31 15:15 |
| New Feature! | Xyzzy | Lounge | 0 | 2017-01-07 22:52 |
| Feature request: Prime95 priority higher than 10 | JuanTutors | Software | 19 | 2006-10-29 04:09 |
| Prime95 Version 24.13 "Feature" | RMAC9.5 | Software | 2 | 2006-03-24 21:12 |
| Designing a home system for CNT. | xilman | Hardware | 6 | 2004-10-21 19:41 |