mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Lounge (https://www.mersenneforum.org/forumdisplay.php?f=7)
-   -   FFT-Size (https://www.mersenneforum.org/showthread.php?t=1598)

andi314 2003-12-07 13:00

FFT-Size
 
How can i calculate the FFT size which i have to use for the different exponents???

andi314

ewmayer 2003-12-08 00:09

[url]http://www.mersenneforum.org/attachments/pdfs/F24.pdf[/url]

Equation 8.

Reboot It 2003-12-08 14:35

On a somewhat related note, has anybody produced a table of RAM used by Prime95 during a LL test vs. FFT size?

ewmayer 2003-12-08 20:22

Not sure exactly what auxiliary arrays (besides the main residue array) Prime95 allocates, but for a length-N FFT (taht is, N real elements) a typical number would be N*8 bytes, plus an additional 10-50% of that for the auxiliary-data arrays.

markhl 2003-12-09 03:25

Re: FFT-Size
 
[QUOTE][i]Originally posted by andi314 [/i]
[B]How can i calculate the FFT size which i have to use for the different exponents???

andi314 [/B][/QUOTE]

[url]http://www.mersenne.org/bench.htm[/url]

ewmayer 2003-12-09 15:34

Re: Re: FFT-Size
 
[QUOTE][i]Originally posted by markhl [/i]
[B][url]http://www.mersenne.org/bench.htm[/url] [/B][/QUOTE]

I thought Andi meant in an a priori fashion - Prime95's tables are a posteriori data, based on trial and (roundoff) error. Andi, in which way did you intend your question?

hbock 2003-12-09 17:57

I think a table with FFT sizes for smaller exponents would be also helpful for people doing ECM factoring on those numbers, just to estimate the minimum of RAM needed. Usually the RAM spent for ECM should be at least 192 times the FFT size (see prime95 help).

In which way the RAM required for P-1 also depends on the FFT size ? I mean especiallly for higher B1/2 values.

ewmayer 2003-12-10 16:42

[QUOTE][i]Originally posted by hbock [/i]
[B]I think a table with FFT sizes for smaller exponents would be also helpful for people doing ECM factoring on those numbers, just to estimate the minimum of RAM needed. Usually the RAM spent for ECM should be at least 192 times the FFT size (see prime95 help).

In which way the RAM required for P-1 also depends on the FFT size ? I mean especiallly for higher B1/2 values. [/B][/QUOTE]

Since most PCs nowadays have at least 256MB of RAM, 192 times the FFT size only becomes an issue when the number being factored takes up > 1MB in floating-point form, which translates to exponents ~> 2.5M .

Both p-1 and ECM are similar in terms of their stage-2 memory usage vs. efficiency trends (though I believe stage 2 ECM to a given bound needs appreciably more memory than stage 2 p-1 to the same bound), so I'll use p-1 to explain it: in stage 2 of p-1 you need to power the stage 1 residue (call it R) to the product of the various stage 2 primes, and to do so efficiently you need to calculate R raised to various even powers corresponding to the various gaps between the stage 2 primes. Ideally you'd simply precalculate R raised to all the prime gaps <= B2 and store those in memory, but for B2 large the gaps can be so large as to make this practically impossible. However, since the larger gaps occur quite rarely, it costs very little to multiply together a few smaller prestored powers to cover the larger gaps when they are encountered. For CPU-efficiency reasons we want all our precomputed powers to fit into RAM, but we also want there to be enough of these so we can cover nearly all the prime gaps that actually occur using the stored powers, i.e. without needing extra multiplies. Practically speaking, a few dozen precomputed powers suffices to cover all but a very small percentage of the gaps, and beyond that you get diminishing returns. In other words, there's a big difference in running p-1 stage 2 on a ~20M-sized exponent using just 64MB RAM (which would only allow the active stage 2 residue and 7 precomputed powers (R^2,4,6,8,10,12,14) to be stored (and there are many prime gaps > 14, so we'd be doing a lot of extra muls, e.g. R^2*R^14 to get R^16) and doing stage 2 using 256 MB of RAM, which would allow ~32 precomputed powers, i.e. R^2 through R^64. There will only be a few % added speedup in going from 256MB to 512MB, and even less in going from 512 to 1024.

andi314 2003-12-10 19:06

i meant the fft size concerning a ll-test

garo 2003-12-11 03:16

This is a post by George on the mailing list the day before these forums opened :)


This table shows FFT size, v21 x87 crossover, v22.8 x87 crossover,
v21 SSE2 crossover, v22.8 SSE2 crossover. As you can see v22
is more liberal with x87 crossovers and more conservative with
SSE2 crossovers.
[CODE]
262144 5255000 5255000 5185000 5158000
327680 6520000 6545000 6465000 6421000
393216 7760000 7779000 7690000 7651000
458752 9040000 9071000 8970000 8908000
524288 10330000 10380000 10240000 10180000
655360 12830000 12890000 12720000 12650000
786432 15300000 15340000 15160000 15070000
917504 17850000 17890000 17660000 17550000
1048576 20400000 20460000 20180000 20050000
1310720 25350000 25390000 25090000 24930000
1572864 30150000 30190000 29920000 29690000
1835008 35100000 35200000 34860000 34560000
2097152 40250000 40300000 39780000 39500000
2621440 50000000 50020000 49350000 49100000
3145728 59400000 59510000 58920000 58520000
3670016 69100000 69360000 68650000 68130000
4194304 79300000 79300000 78360000 77910000
[/CODE]
Now the gotcha. In v22.8, FFT crossovers are flexible. If you test an
exponent within 0.2% of a crossover point, then 1000 sample iterations
are performed using the smaller FFT size and the average roundoff
error calculated. If the average is less than 0.241 for a 256K FFT or
0.243 for a 4M FFT, then the smaller FFT size is used.

Brian Beesley has been a great help in investigating revised crossover
points and analyzing the distribution of round off errors. We noticed
that consecutive exponents can have a pretty big difference in average
roundoff error (e.g. one exponent could be 0.236 and the next 0.247).
This is why I elected to try the flexible approach described above. The
0.241 to 0.243 average was chosen hoping for about 1 iteration in a
million generating a roundoff error above 0.4. We might change the 0.241
to 0.243 constants with more data - it is hard to get enough data points
to accurately measure 1 in a million occurrences.

One downside is the server does not know which FFT size is used and
will credit you based on the v21 x87 crossovers. Thus, if you are a lucky
person, you might get "bonus" CPU credit where you test the exponent
at a smaller FFT size and the server credits you based on the larger
FFT's timing.

jinydu 2006-11-24 01:26

I'm using version 24.13, and judging by an increase in the per iteration time of well over 10%, there is a FFT length change somewhere between 17440543 and 17776357.

Unfortunately, I suspect that Primenet may not recognize this jump and could underestimate my credit by over 10% when M17,776,357 finally finishes (it just started less than an hour ago) :sad: . Time will tell...

EDIT: After playing around with the Time function in the Advanced menu, I have found that the jump occurs at precisely 17.55M. Oddly enough, the per iteration times I got during the Time test were far better than my usual times. Then, when I resumed the usual test, my per iteration time improved dramatically!


All times are UTC. The time now is 04:53.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.