mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Operation Billion Digits (https://www.mersenneforum.org/forumdisplay.php?f=50)
-   -   Mfactor sieve code thread (https://www.mersenneforum.org/showthread.php?t=4939)

fatphil 2005-11-09 14:25

[QUOTE=jasonp]
Phil, before you spend as much time on this as I did
[/QUOTE]

Too late :-)

Last night I came up with 3 slightly different routines, including one looking identical to yours, each of which could possibly be fastest depending on quantity of unrolling and related register usage. Only testing would tell me which would be best in reality...
However, all are untested as I don't have access to my G5 presently, I simply scribbled some example code on my craptop. It'll be 2 weeks before I get to see my G5 again.

[QUOTE=jasonp]
Major bonus points to anybody who can figure out how to manipulate the FPU rounding mode so that the final correction at the end isn't necessary. Just setting the rounding mode to truncate the quotient still produces a negative remainder sometimes.

Happy crunching,
jasonp[/QUOTE]

Is the routine to set the rounding mode in the ppc_intrinsics header file?
I had assumed no correction would be necessary, I shall look into it when I do finally power up gcc/gdb on the old crate. Of course, the correction isn't necessary a large proportion of the time for what most people are doing, I'd bet.

Oh - props for msieve - beautiful code.

jasonp 2005-11-09 15:40

[QUOTE=fatphil]Too late :-)

Last night I came up with 3 slightly different routines, including one looking identical to yours, each of which could possibly be fastest depending on quantity of unrolling and related register usage. Only testing would tell me which would be best in reality...
[/QUOTE]
These algorithms can be especially useful on the G4, where an integer modmul would have excruciating latency. Let me know if you'd be interested in a vectorized version that uses software pipelining.

I forgot to mention in my previous message that because the FPU handles signed inputs automatically, a or b can be negative, and can also exceed the modulus (as long as intermediate products can be represented exactly). This means that normalization isn't necessary if the above routine is the core of a modular exponentiation. Also, for things like integer FFTs where the input is the result of a subtraction, you don't have to normalize the input before going straight into the multiply code.

[QUOTE]
Is the routine to set the rounding mode in the ppc_intrinsics header file?
I had assumed no correction would be necessary, I shall look into it when I do finally power up gcc/gdb on the old crate. Of course, the correction isn't necessary a large proportion of the time for what most people are doing, I'd bet.

Oh - props for msieve - beautiful code.[/QUOTE]
Thanks, it was great fun to write.

I never looked seriously at the intrinsics for rounding mode; asm() and a powerpc manual were sufficient :)

jasonp

ewmayer 2005-11-09 20:00

Thanks for the sample code, Jason - nice to see that vendor-provided intrinsics files are becoming more and more common - less futzing around with ASM macros, which is especially useful in early stages of code development.

I'm working out how all this can be used to speed a > 64-bit modmul - using pairs of 50-bit floats one could handle moduli of 100 bits (101 using balanced-digit for the LSW of each pair), which would be a nice alternative (or complement) to, say, a 96 or 128-bit pure-integer modmul. In that context the possibly negative lower-half outputs you mention are not a problem - unless one considers getting the digit-balancing for free a problem, that is.

ValerieVonck 2005-11-27 10:58

Anyone got a Windows 32 bit build?

Best Regards.

gribozavr 2006-02-10 20:26

Win32 binaries are available here: [url]http://ohmdpp.5gigs.com/files/mfactor/Mfactor_i686_win32_bin.tar.gz[/url]
32-bit linux binaries: [url]http://ohmdpp.5gigs.com/files/mfactor/Mfactor_i686_linux_bin.tar.gz[/url]

ValerieVonck 2006-02-10 20:42

Thnx!

em99010pepe 2006-02-11 17:48

[quote=gribozavr]Win32 binaries are available here: [URL="http://ohmdpp.5gigs.com/files/mfactor/Mfactor_i686_win32_bin.tar.gz"]http://ohmdpp.5gigs.com/files/mfactor/Mfactor_i686_win32_bin.tar.gz[/URL]
32-bit linux binaries: [URL="http://ohmdpp.5gigs.com/files/mfactor/Mfactor_i686_linux_bin.tar.gz"]http://ohmdpp.5gigs.com/files/mfactor/Mfactor_i686_linux_bin.tar.gz[/URL][/quote]

Thanks.

Carlos

Joshua2 2006-03-21 05:33

How does the speed of mfactor compare to factor4 on 100 million or billion digit numbers on 32 bit windows athlon non-sse2? If this is too specific then what is known about its speed? Should I switch from using factor4?

T.Rex 2006-03-21 20:44

Source code of MFactor
 
[QUOTE=ewmayer]... my Mfactor mersenne-sieving code ...[/QUOTE]I cannot find the source code. Where is it ?
It does not appear in this [URL="http://www.mersenneforum.org/showthread.php?t=3255"]thread[/URL].
Thanks,
T.

Joshua2 2006-03-22 04:44

[QUOTE=Joshua2]How does the speed of mfactor compare to factor4 on 100 million or billion digit numbers on 32 bit windows athlon non-sse2? If this is too specific then what is known about its speed? Should I switch from using factor4?[/QUOTE]
I did some limited testing and it appears that mfacor is significantly faster. Is it ok to use? (ie not missing factors)

ewmayer 2006-03-22 23:43

[QUOTE=Joshua2]I did some limited testing and it appears that mfacor is significantly faster. Is it ok to use? (ie not missing factors)[/QUOTE]
I've just posted an updated Win32 binary (see link in post at top of thread), which fixes a bug that could cause some factors to be missed. The bug only affects the floating-based factoring routines, i.e. only the Win32 build had this problem.

Dmitri (a.k.a. gribozavr), please update any of your Win-binary links (and local copies of the Win32 binary) as needed.

Happy hunting,
-Ernst


All times are UTC. The time now is 09:25.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.