![]() |
|
|
#144 | |
|
Dec 2011
New York, U.S.A.
11000012 Posts |
Quote:
However, I don't remember right now what the exact conditions of the test were -- for example, I may have been looking more at the initialization code than at the genefer algorithm -- so I wouldn't put too much weight on those results. But you should be able to get the information you want using their tools. |
|
|
|
|
|
|
#145 |
|
Jun 2003
13DD16 Posts |
|
|
|
|
|
|
#146 |
|
Dec 2011
New York, U.S.A.
97 Posts |
|
|
|
|
|
|
#147 | ||
|
Dec 2011
New York, U.S.A.
97 Posts |
Quote:
Quote:
|
||
|
|
|
|
|
#148 | |
|
Jun 2003
117358 Posts |
Quote:
|
|
|
|
|
|
|
#149 | |
|
Dec 2011
New York, U.S.A.
97 Posts |
Quote:
Here's the relevant code (everything except the declarations): Code:
SETCLOCK(clock1);
FFTsquareGFN(z, n1, n2);
squareTime += elapsedTime(clock1);
bt = Na[i / (8 * sizeof(uint32_t))] >> (i % (8 * sizeof(uint32_t))) & 1;
SETCLOCK(clock1);
FFTnextStepGFN(z, b, (bt == 0) ? t1 : t2, n1, n2, t3);
nextTime += elapsedTime(clock1);
...
if (squareTime+nextTime==0.0) squareTime=nextTime=1.0;
printf("\n FFTsquareGFN=%.2f%% FFTnextStepGFN=%.2f%% (raw: %.4f %.4f seconds)\n", 100*squareTime/(squareTime+nextTime), 100*nextTime/(squareTime+nextTime), squareTime, nextTime);
|
|
|
|
|
|
|
#150 |
|
Jun 2003
32·5·113 Posts |
I know. It was one of those idle hopes
![]() The outcome is serious enough that I have just gone ahead and ordered my very own GT 520 (cheapest DP capable part I could find ) to further explore this. Hopefully, before it arrives, I will have finished up the linux port of the sieve, and have some free time.
|
|
|
|
|
|
#151 | |
|
Dec 2011
New York, U.S.A.
97 Posts |
Quote:
The ONLY code changes to measure the internal timings are in the check() routine, so you can just plug in my modified genefer.cpp file and build the other versions of genefer. It would be interesting to see if you see the same behavior in the CPU versions. |
|
|
|
|
|
|
#152 | |
|
Jun 2003
32×5×113 Posts |
Quote:
The GPU numbers are definitely an anomaly. |
|
|
|
|
|
|
#153 | |
|
Dec 2011
New York, U.S.A.
9710 Posts |
Figured it out. It was the parallel processing of the square and nextstep overlapping each other that was messing up the metrics. Serializing the steps slowed the program down a bit, but does yield more accurate measurements:
Quote:
|
|
|
|
|
|
|
#154 | |
|
Jun 2003
10011110111012 Posts |
Quote:
Anyway, once I have my GT520, I'll do some visual profiling and see if any improvement can be wrung out of the code. But I'm not holding my breath for any big improvements. |
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Genefer's FFT applied to Mersenne squaring | preda | Software | 0 | 2017-09-06 02:54 |
| CUDA 5.5 | ET_ | GPU Computing | 2 | 2013-06-13 15:50 |
| AVX CPU LL vs CUDA LL | nucleon | GPU Computing | 11 | 2012-01-04 17:52 |
| Best CUDA GPU for the $$ | Christenson | GPU Computing | 24 | 2011-05-01 00:06 |
| CUDA? | Xentar | Conjectures 'R Us | 6 | 2010-03-31 07:43 |