mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Operation Billion Digits

Reply
 
Thread Tools
Old 2005-11-09, 14:25   #12
fatphil
 
fatphil's Avatar
 
May 2003

3×7×11 Posts
Default

Quote:
Originally Posted by jasonp
Phil, before you spend as much time on this as I did
Too late :-)

Last night I came up with 3 slightly different routines, including one looking identical to yours, each of which could possibly be fastest depending on quantity of unrolling and related register usage. Only testing would tell me which would be best in reality...
However, all are untested as I don't have access to my G5 presently, I simply scribbled some example code on my craptop. It'll be 2 weeks before I get to see my G5 again.

Quote:
Originally Posted by jasonp
Major bonus points to anybody who can figure out how to manipulate the FPU rounding mode so that the final correction at the end isn't necessary. Just setting the rounding mode to truncate the quotient still produces a negative remainder sometimes.

Happy crunching,
jasonp
Is the routine to set the rounding mode in the ppc_intrinsics header file?
I had assumed no correction would be necessary, I shall look into it when I do finally power up gcc/gdb on the old crate. Of course, the correction isn't necessary a large proportion of the time for what most people are doing, I'd bet.

Oh - props for msieve - beautiful code.
fatphil is offline   Reply With Quote
Old 2005-11-09, 15:40   #13
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

22×3×293 Posts
Default

Quote:
Originally Posted by fatphil
Too late :-)

Last night I came up with 3 slightly different routines, including one looking identical to yours, each of which could possibly be fastest depending on quantity of unrolling and related register usage. Only testing would tell me which would be best in reality...
These algorithms can be especially useful on the G4, where an integer modmul would have excruciating latency. Let me know if you'd be interested in a vectorized version that uses software pipelining.

I forgot to mention in my previous message that because the FPU handles signed inputs automatically, a or b can be negative, and can also exceed the modulus (as long as intermediate products can be represented exactly). This means that normalization isn't necessary if the above routine is the core of a modular exponentiation. Also, for things like integer FFTs where the input is the result of a subtraction, you don't have to normalize the input before going straight into the multiply code.

Quote:
Is the routine to set the rounding mode in the ppc_intrinsics header file?
I had assumed no correction would be necessary, I shall look into it when I do finally power up gcc/gdb on the old crate. Of course, the correction isn't necessary a large proportion of the time for what most people are doing, I'd bet.

Oh - props for msieve - beautiful code.
Thanks, it was great fun to write.

I never looked seriously at the intrinsics for rounding mode; asm() and a powerpc manual were sufficient :)

jasonp
jasonp is offline   Reply With Quote
Old 2005-11-09, 20:00   #14
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Rep├║blica de California

252318 Posts
Default

Thanks for the sample code, Jason - nice to see that vendor-provided intrinsics files are becoming more and more common - less futzing around with ASM macros, which is especially useful in early stages of code development.

I'm working out how all this can be used to speed a > 64-bit modmul - using pairs of 50-bit floats one could handle moduli of 100 bits (101 using balanced-digit for the LSW of each pair), which would be a nice alternative (or complement) to, say, a 96 or 128-bit pure-integer modmul. In that context the possibly negative lower-half outputs you mention are not a problem - unless one considers getting the digit-balancing for free a problem, that is.

Last fiddled with by ewmayer on 2005-11-09 at 20:01
ewmayer is offline   Reply With Quote
Old 2005-11-27, 10:58   #15
ValerieVonck
 
ValerieVonck's Avatar
 
Mar 2004
Belgium

33×31 Posts
Default

Anyone got a Windows 32 bit build?

Best Regards.

Last fiddled with by ValerieVonck on 2005-11-27 at 10:58
ValerieVonck is offline   Reply With Quote
Old 2006-02-10, 20:26   #16
gribozavr
 
gribozavr's Avatar
 
Mar 2005
Internet; Ukraine, Kiev

40710 Posts
Default

Win32 binaries are available here: http://ohmdpp.5gigs.com/files/mfacto...n32_bin.tar.gz
32-bit linux binaries: http://ohmdpp.5gigs.com/files/mfacto...nux_bin.tar.gz

Last fiddled with by gribozavr on 2006-02-10 at 20:26
gribozavr is offline   Reply With Quote
Old 2006-02-10, 20:42   #17
ValerieVonck
 
ValerieVonck's Avatar
 
Mar 2004
Belgium

83710 Posts
Default

Thnx!
ValerieVonck is offline   Reply With Quote
Old 2006-02-11, 17:48   #18
em99010pepe
 
em99010pepe's Avatar
 
Sep 2004

2·5·283 Posts
Default

Quote:
Originally Posted by gribozavr
Thanks.

Carlos
em99010pepe is offline   Reply With Quote
Old 2006-03-21, 05:33   #19
Joshua2
 
Joshua2's Avatar
 
Sep 2004

13·41 Posts
Default

How does the speed of mfactor compare to factor4 on 100 million or billion digit numbers on 32 bit windows athlon non-sse2? If this is too specific then what is known about its speed? Should I switch from using factor4?

Last fiddled with by Joshua2 on 2006-03-21 at 05:33
Joshua2 is offline   Reply With Quote
Old 2006-03-21, 20:44   #20
T.Rex
 
T.Rex's Avatar
 
Feb 2004
France

911 Posts
Default Source code of MFactor

Quote:
Originally Posted by ewmayer
... my Mfactor mersenne-sieving code ...
I cannot find the source code. Where is it ?
It does not appear in this thread.
Thanks,
T.
T.Rex is offline   Reply With Quote
Old 2006-03-22, 04:44   #21
Joshua2
 
Joshua2's Avatar
 
Sep 2004

13×41 Posts
Default

Quote:
Originally Posted by Joshua2
How does the speed of mfactor compare to factor4 on 100 million or billion digit numbers on 32 bit windows athlon non-sse2? If this is too specific then what is known about its speed? Should I switch from using factor4?
I did some limited testing and it appears that mfacor is significantly faster. Is it ok to use? (ie not missing factors)
Joshua2 is offline   Reply With Quote
Old 2006-03-22, 23:43   #22
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Rep├║blica de California

3·5·727 Posts
Default

Quote:
Originally Posted by Joshua2
I did some limited testing and it appears that mfacor is significantly faster. Is it ok to use? (ie not missing factors)
I've just posted an updated Win32 binary (see link in post at top of thread), which fixes a bug that could cause some factors to be missed. The bug only affects the floating-based factoring routines, i.e. only the Win32 build had this problem.

Dmitri (a.k.a. gribozavr), please update any of your Win-binary links (and local copies of the Win32 binary) as needed.

Happy hunting,
-Ernst
ewmayer is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Easiest working Quadratic Sieve code? skan Programming 4 2018-03-24 09:54
Sieve Benchmark Thread Historian Twin Prime Search 105 2013-02-05 01:35
Factor5 source code thread ET_ Operation Billion Digits 10 2008-09-17 12:28
Where I should write C code (thread moved) maqableh Programming 9 2006-05-12 16:22
New Sieve Thread Discussion Citrix Prime Sierpinski Project 15 2005-08-29 13:56

All times are UTC. The time now is 05:17.

Wed Apr 1 05:17:44 UTC 2020 up 7 days, 2:50, 0 users, load averages: 0.90, 1.27, 1.37

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.