![]() |
[QUOTE=preda;491835]I added a factor-9 step, and now there's a larger selection of FFT sizes:
[CODE] FFT maxExp W H M 0.5M 10.3M 512 512 1 1.0M 20.3M 1024 512 1 2.0M 39.8M 2048 512 1 2.0M 39.8M 512 2048 1 2.5M 49.4M 512 512 5 4.0M 78.0M 1024 2048 1 4.0M 78.0M 4096 512 1 4.5M 87.5M 512 512 9 5.0M 96.9M 1024 512 5 8.0M 153.0M 2048 2048 1 9.0M 171.6M 1024 512 9 10.0M 190.0M 512 2048 5 10.0M 190.0M 2048 512 5 16.0M 300.0M 4096 2048 1 18.0M 336.3M 2048 512 9 18.0M 336.3M 512 2048 9 20.0M 372.5M 4096 512 5 20.0M 372.5M 1024 2048 5 36.0M 659.0M 1024 2048 9 36.0M 659.0M 4096 512 9 40.0M 730.0M 2048 2048 5 72.0M 1290.9M 2048 2048 9 80.0M 1429.8M 4096 2048 5 144.0M 2527.5M 4096 2048 9 [/CODE] Now it's a bit easier to validate openowl on small known primes (e.g. M(1398269) in 6 minutes). For fun, it can also do things like 1Billion exponents in 39ms/it. (As I have not tested every FFT size precisely, bugs may be hiding around.)[/QUOTE] Wow. I don't see the POT lengths 32M, 64M, or 128M there. Presumably something like 4096 4096 1 8192 4096 1 8192 8192 1 respectively. A ~1 billion exponent at 39ms/it is ~451 days or ~15 months to completion, without errors requiring repetition of GEC blocks. (On an RX Vega 64 presumably.) What version do you call this advance? |
[QUOTE=kriesel;491854]Wow.
I don't see the POT lengths 32M, 64M, or 128M there. Presumably something like 4096 4096 1 8192 4096 1 8192 8192 1 respectively. A ~1 billion exponent at 39ms/it is ~451 days or ~15 months to completion, without errors requiring repetition of GEC blocks. (On an RX Vega 64 presumably.) What version do you call this advance?[/QUOTE] The "column" (H) step of the matrix FFT now does only 512 or 2048, that's why those sizes are missing. I agree that doing 1billion exponents is likely not a good idea. I'll probably bump the version to 3.4, I just want to do a bit more tuning/validation before that. |
[QUOTE=preda;491858]The "column" (H) step of the matrix FFT now does only 512 or 2048, that's why those sizes are missing.
I agree that doing 1billion exponents is likely not a good idea. I'll probably bump the version to 3.4, I just want to do a bit more tuning/validation before that.[/QUOTE] The argument -list fft show empty list and says Bye |
[QUOTE=SELROC;491861]The argument -list fft show empty list and says Bye[/QUOTE]
Yes but only if the worktodo.txt file is empty, otherwise it lists ffts and starts computation ! |
I re-enabled the display of devices with "-h" (OpenCL only), and re-enabled the kernel profiling with "-time" (OpenCL only).
In the list of FFT sizes, there are in places multiple successive lines with the same size. By default the app selects the first line for a given size. The others can be selected easily with "-fft +1", "-fft +2", etc. There are small performance differences between them, so the user can investigate and choose the fastest. There is no "auto-tuning" yet (where the program automatically times and selects the fastest). |
[QUOTE=preda;491904]I re-enabled the display of devices with "-h" (OpenCL only), and re-enabled the kernel profiling with "-time" (OpenCL only).
In the list of FFT sizes, there are in places multiple successive lines with the same size. By default the app selects the first line for a given size. The others can be selected easily with "-fft +1", "-fft +2", etc. There are small performance differences between them, so the user can investigate and choose the fastest. There is no "auto-tuning" yet (where the program automatically times and selects the fastest).[/QUOTE] Testing 300M exponent with version 3.4, selected FFT size is 18M. Now I am using -fft +1 (18M FFT) and the timing went from 18 ms/it to 16 ms/it. The ETA went from 62d to 56d. |
Pushing the GPU fan up a bit:
[CODE] amdgpu-pci-6700 Adapter: PCI adapter vddgfx: +1.06 V fan1: 3276 RPM temp1: +70.0°C (crit = +89.0°C, hyst = -273.1°C) power1: 206.00 W (cap = 220.00 W) [/CODE] I get just under 9ms/it for "100M digits" exponents; Vega64, 205W, temperature 70C. [CODE] vega0 16570000/332193109 [ 4.99%], 8.95 ms/it [8.94, 8.95]; ETA 32d 16:31; f6b94760b829ddec [/CODE] This is with the amdgpu-pro 18.20 driver. There is hope that ROCm may be a bit better still (when I can install it). |
[QUOTE=SELROC;491917]Testing 300M exponent with version 3.4, selected FFT size is 18M. Now I am using -fft +1 (18M FFT) and the timing went from 18 ms/it to 16 ms/it. The ETA went from 62d to 56d.[/QUOTE]
It is actually possible to use -fft 16M with a 300M exponent, the timing goes down to 14 ms/it ... |
[QUOTE=SELROC;491968]It is actually possible to use -fft 16M with a 300M exponent, the timing goes down to 14 ms/it ...[/QUOTE]
Yes, but you're in the danger zone at that bits-per-word level. You're likely to encounter numerical errors, that will trigger retries. That works fine, only that it costs some time. In such a situation it may be worth starting the exponent with a lower "block size" than the default of 400, with "-block 100" or "-block 200". |
[QUOTE=preda;491979]Yes, but you're in the danger zone at that bits-per-word level. You're likely to encounter numerical errors, that will trigger retries. That works fine, only that it costs some time. In such a situation it may be worth starting the exponent with a lower "block size" than the default of 400, with "-block 100" or "-block 200".[/QUOTE]
Fine tuning is not an easy task. I have to investigate when time permits :-) |
[QUOTE=SELROC;491981]Fine tuning is not an easy task. I have to investigate when time permits :-)[/QUOTE]
OK [QUOTE=preda;491979]Yes, but you're in the danger zone at that bits-per-word level. You're likely to encounter numerical errors, that will trigger retries. That works fine, only that it costs some time. In such a situation it may be worth starting the exponent with a lower "block size" than the default of 400, with "-block 100" or "-block 200".[/QUOTE] using the master version: 1) I have tried -block 200 but it seems stubborn to blockSize 400. 2) I have noted that for the 300M the best FFT is 18M even it is slower, with 16M the ETA was going back in time by a considerable amount of time (hours), which I suppose means a lot of retries have been done, but no error has been reported, I mean no EE but the timing was varying considerably. |
| All times are UTC. The time now is 23:02. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.