mersenneforum.org > Math Montgomery Multiplication
 Register FAQ Search Today's Posts Mark Forums Read

 2004-12-24, 00:43 #1 dave_dm   May 2004 24·5 Posts Montgomery Multiplication I'm considering writing a 32-bit modular Montgomery reduction routine. This obviously needs to be hand-coded in asm, so experience tells me I should probably ask around first. On x86 hardware, the naive method of reducing a 64-bit integer modulo a 32-bit integer uses a single DIV instruction which IIRC takes about 17 cycles. When doing a Montgomery reduction, we do two multiplies (3 cycles each), an addition and a contitional subtraction. I assume that it's worth doing some SBB magic to work around the conditional flushing of cache lines every so often. So, that all done, I can't really see 32-bit Montgomery reduction being faster than a DIV instruction. Has anyone tried this out? Dave
 2004-12-24, 05:18 #2 axn     Jun 2003 113628 Posts I think the DIV takes 37 cycles, not 17 (not sure)
 2004-12-24, 11:00 #3 dave_dm   May 2004 24×5 Posts I had a look in the AMD White Paper 22007 and it seems to say that a 32-bit division has a latency of 40 cycles (=lots). So your figure of 37 is probably right for some particular species of x86. The figure of 17 seems to be only of historical interest as it's the latency of a 8-bit division. Eugh! Dave

 Similar Threads Thread Thread Starter Forum Replies Last Post mickfrancis Computer Science & Computational Number Theory 3 2015-06-25 14:32 R.D. Silverman Factoring 8 2014-06-07 18:43 SPWorley Math 5 2009-08-18 17:27 T.Rex Math 10 2006-04-18 06:57 akruppa Hardware 1 2005-08-04 10:25

All times are UTC. The time now is 09:51.

Thu Jan 28 09:51:32 UTC 2021 up 56 days, 6:02, 0 users, load averages: 3.72, 3.44, 2.91