![]() |
|
|
#1 |
|
Nov 2002
7410 Posts |
How can i calculate the FFT size which i have to use for the different exponents???
andi314 |
|
|
|
|
|
#2 |
|
∂2ω=0
Sep 2002
República de California
103·113 Posts |
|
|
|
|
|
|
#3 |
|
Aug 2002
London, UK
7·13 Posts |
On a somewhat related note, has anybody produced a table of RAM used by Prime95 during a LL test vs. FFT size?
|
|
|
|
|
|
#4 |
|
∂2ω=0
Sep 2002
República de California
103·113 Posts |
Not sure exactly what auxiliary arrays (besides the main residue array) Prime95 allocates, but for a length-N FFT (taht is, N real elements) a typical number would be N*8 bytes, plus an additional 10-50% of that for the auxiliary-data arrays.
Last fiddled with by ewmayer on 2003-12-08 at 20:23 |
|
|
|
|
|
#5 | |
|
Apr 2003
California
1348 Posts |
Quote:
|
|
|
|
|
|
|
#6 | |
|
∂2ω=0
Sep 2002
República de California
103×113 Posts |
Quote:
|
|
|
|
|
|
|
#7 |
|
Feb 2003
163 Posts |
I think a table with FFT sizes for smaller exponents would be also helpful for people doing ECM factoring on those numbers, just to estimate the minimum of RAM needed. Usually the RAM spent for ECM should be at least 192 times the FFT size (see prime95 help).
In which way the RAM required for P-1 also depends on the FFT size ? I mean especiallly for higher B1/2 values. |
|
|
|
|
|
#8 | |
|
∂2ω=0
Sep 2002
República de California
103×113 Posts |
Quote:
Both p-1 and ECM are similar in terms of their stage-2 memory usage vs. efficiency trends (though I believe stage 2 ECM to a given bound needs appreciably more memory than stage 2 p-1 to the same bound), so I'll use p-1 to explain it: in stage 2 of p-1 you need to power the stage 1 residue (call it R) to the product of the various stage 2 primes, and to do so efficiently you need to calculate R raised to various even powers corresponding to the various gaps between the stage 2 primes. Ideally you'd simply precalculate R raised to all the prime gaps <= B2 and store those in memory, but for B2 large the gaps can be so large as to make this practically impossible. However, since the larger gaps occur quite rarely, it costs very little to multiply together a few smaller prestored powers to cover the larger gaps when they are encountered. For CPU-efficiency reasons we want all our precomputed powers to fit into RAM, but we also want there to be enough of these so we can cover nearly all the prime gaps that actually occur using the stored powers, i.e. without needing extra multiplies. Practically speaking, a few dozen precomputed powers suffices to cover all but a very small percentage of the gaps, and beyond that you get diminishing returns. In other words, there's a big difference in running p-1 stage 2 on a ~20M-sized exponent using just 64MB RAM (which would only allow the active stage 2 residue and 7 precomputed powers (R^2,4,6,8,10,12,14) to be stored (and there are many prime gaps > 14, so we'd be doing a lot of extra muls, e.g. R^2*R^14 to get R^16) and doing stage 2 using 256 MB of RAM, which would allow ~32 precomputed powers, i.e. R^2 through R^64. There will only be a few % added speedup in going from 256MB to 512MB, and even less in going from 512 to 1024. |
|
|
|
|
|
|
#9 |
|
Nov 2002
2×37 Posts |
i meant the fft size concerning a ll-test
|
|
|
|
|
|
#10 |
|
Aug 2002
Termonfeckin, IE
22·691 Posts |
This is a post by George on the mailing list the day before these forums opened :)
This table shows FFT size, v21 x87 crossover, v22.8 x87 crossover, v21 SSE2 crossover, v22.8 SSE2 crossover. As you can see v22 is more liberal with x87 crossovers and more conservative with SSE2 crossovers. Code:
262144 5255000 5255000 5185000 5158000 327680 6520000 6545000 6465000 6421000 393216 7760000 7779000 7690000 7651000 458752 9040000 9071000 8970000 8908000 524288 10330000 10380000 10240000 10180000 655360 12830000 12890000 12720000 12650000 786432 15300000 15340000 15160000 15070000 917504 17850000 17890000 17660000 17550000 1048576 20400000 20460000 20180000 20050000 1310720 25350000 25390000 25090000 24930000 1572864 30150000 30190000 29920000 29690000 1835008 35100000 35200000 34860000 34560000 2097152 40250000 40300000 39780000 39500000 2621440 50000000 50020000 49350000 49100000 3145728 59400000 59510000 58920000 58520000 3670016 69100000 69360000 68650000 68130000 4194304 79300000 79300000 78360000 77910000 exponent within 0.2% of a crossover point, then 1000 sample iterations are performed using the smaller FFT size and the average roundoff error calculated. If the average is less than 0.241 for a 256K FFT or 0.243 for a 4M FFT, then the smaller FFT size is used. Brian Beesley has been a great help in investigating revised crossover points and analyzing the distribution of round off errors. We noticed that consecutive exponents can have a pretty big difference in average roundoff error (e.g. one exponent could be 0.236 and the next 0.247). This is why I elected to try the flexible approach described above. The 0.241 to 0.243 average was chosen hoping for about 1 iteration in a million generating a roundoff error above 0.4. We might change the 0.241 to 0.243 constants with more data - it is hard to get enough data points to accurately measure 1 in a million occurrences. One downside is the server does not know which FFT size is used and will credit you based on the v21 x87 crossovers. Thus, if you are a lucky person, you might get "bonus" CPU credit where you test the exponent at a smaller FFT size and the server credits you based on the larger FFT's timing. Last fiddled with by garo on 2003-12-11 at 03:17 |
|
|
|
|
|
#11 |
|
Dec 2003
Hopefully Near M48
2×3×293 Posts |
I'm using version 24.13, and judging by an increase in the per iteration time of well over 10%, there is a FFT length change somewhere between 17440543 and 17776357.
Unfortunately, I suspect that Primenet may not recognize this jump and could underestimate my credit by over 10% when M17,776,357 finally finishes (it just started less than an hour ago) . Time will tell...EDIT: After playing around with the Time function in the Advanced menu, I have found that the jump occurs at precisely 17.55M. Oddly enough, the per iteration times I got during the Time test were far better than my usual times. Then, when I resumed the usual test, my per iteration time improved dramatically! Last fiddled with by jinydu on 2006-11-24 at 02:01 |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Max FFT Size? | aketilander | Software | 44 | 2018-12-19 17:58 |
| Force FFT-size to be used | kruoli | Software | 4 | 2017-11-17 18:14 |
| Pi(x) value for x at 10^16 size | edorajh | Computer Science & Computational Number Theory | 6 | 2017-03-08 20:28 |
| Size optimization | Sleepy | Msieve | 14 | 2011-10-20 10:27 |
| Exponent Size Gap | Mini-Geek | PrimeNet | 8 | 2007-03-25 07:29 |