mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   A new algorithm for generic modular reduction (https://www.mersenneforum.org/showthread.php?t=28562)

Mysticial 2023-04-11 22:24

[QUOTE=henryzz;628246]Presumably this ~doubles the FFT length that will fit in a L3 cache. This could be very useful for some cache size/FFT length combinations.[/QUOTE]

Yup. Or in more general, any memory boundary where the next level is bandwidth-bound and significantly slower than the lower level.

So in addition to doing it at the highest level to reduce a computation's peak memory/disk consumption (and thus require fewer hard drives), I also do it at the memory/disk boundary.

Given two numbers A and B that reside on disk and I want to compute A x B:[LIST=1][*]Check to see if they can be pulled into memory and done at full speed.[*]If the above fails, see if they can be pulled into memory and done using cyclic/negacyclic CRT. (which I internally call it the "Symmetric CRT" algorithm)[*]If it still fails, then use the disk FFT because you're out of options.[/LIST]
I don't do it between cache and memory though. Cache is too volatile for a heterogeneous workload where other threads may be poorly behaved.

Prime95 2023-04-12 03:20

[QUOTE=Yves Gallot;628174]
If g = 1 we have R = 2[SUP]n[/SUP] - 1, Q' = 2[SUP]n+1[/SUP] - 1: two cyclic transforms of the same size with different weights.[/QUOTE]

Again GWNUM was not coded with "different weights" in mind. The carry propagation code de-weights, propagates carries, then re-weights. This code would need to de-weight with one set of weights and then re-weight with a different set of weights depending on the the expected next use of the multiplication result. Ugh. I'd probably also need new code to convert from one set of weights to another.

If GWNUM had been designed with this in mind from the beginning, the weights would be applied during the FFT and inverse FFT rather than the carry propagation code (though that would be slower).

[QUOTE=Mysticial;628241]CRT of cyclic (2^N - 1) and negacyclic (2^N + 1) is mathematically the same as the first radix 2 reduction of a normal FFT. The difference is where you perform the carryout.[/QUOTE]

And this is probably the easiest way forward. I'd use existing 2^2N-1 cyclic FFT code. New code would allow doing either the full 2^2N-1 cyclic or faster 2^N-1 cyclic or 2^N+1 negacyclic. Not trivial.


IIUC a roughly 28% speed improvement.


All times are UTC. The time now is 14:03.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.