2006-08-01
Thanks for the help everyone.

From the looks of it in 64 bit mode the generic C mulmod using the FPU is much faster than the assembler for both 32 and 64 bit arguments, but in 32 bit mode the assembler integer version is a little faster for 32 bit arguments and the assembler FPU version for 64 bit arguments.
