![]() |
|
|
#12 |
|
∂2ω=0
Sep 2002
República de California
19×613 Posts |
I should add one caveat to my assertions above: If one spends a significant time of a transform-based squaring in the DWT-weighting and output-normalization steps, then there might indeed be a modest savings from doing a longer FFT with smaller input words which allow for 2 or more dyadic multiplies in a row.
I haven't donw precise timings in a while but I seem to recall the typical overhead for the DWT-weight/unweight and normalize-and-propagate-carries is 10-20% of the overall modmul time. For my Mlucas code, there is a roughly 10% additional overhead in Mersenne-mod mode due to "wrapper step" needed around the dyadic-mul, to convert the outputs of the fundamentally complex-input FFT I use to ones reflective of a real-vector FFT. (I didn't want to code 2 separate FFTs for e.g. Fermat-mod and Mersenne-mod arithmetic...Fermat-mod uses a straight complex FFT if one uses the right-angle-transform weighting trick). |
|
|
|
|
|
#13 |
|
"Richard B. Woods"
Aug 2002
Wisconsin USA
22×3×641 Posts |
|
|
|
|