-   GMP-ECM (
-   -   Would you use a 'fat binary' of GMP-ECM? (

jasonp 2012-02-11 13:15

Would you use a 'fat binary' of GMP-ECM?
I've never liked all the mental effort that everyone here expends to get the perfect GMP-ECM binary that runs at optimal speed. It seems there are a dozen different sources of compiled binaries that are tweaked for this platform or that. In contrast, GMP switched to a 'fat binary' approach many years ago, where for x86 the build process compiles all the assembly code for all architectures and a runtime dispatcher chooses the bundles of code that are called. This is slightly slower than using a binary with just one set of assembly code compiled in, but you don't need to figure out any perfect combinations for you or your userbase.

The reason I mention this is that I'm currently performing an overhaul of some of the plumbing in GMP-ECM, and it's becoming clear that parts of the library are leaving a fair amount of performance on the table because they can only make one choice of how to do things at runtime.

For example, stage 2 performs number-theoretic FFTs that have many internal bells and whistles. If you use SSE2 then these FFTs run 30% faster on 32-bit platforms, so the build process detects SSE2 automatically and turns it on. However, on 32-bit platforms there is a choice of how you do modular multiplication, and the compiled-in choice is the [i]second[/i] fastest (it's 10% slower), because the fastest choice would severely limit how big a stage 2 you can do. Why not just build in both, and choose the fastest one at runtime? A development branch of the library that I'm working on also has another set of choices that remove any limits on how big a stage 2 your 32-bit machine can handle, at a cost of (currently) 1.5x in performance. If this was the default then nobody would be happy, but if you don't use it then your 32-bit machine will only ever be able to handle smaller problems.

The stage 1 code is much more CPU-specific and would clearly benefit from a CPU-dispatcher; it can just compile all the various redc versions plus a generic one, then put them in a table at library start time and look in the table when it knows which machine it's running on.

The problem is that building all of this is a great deal of work for the developers, which is only worthwhile if the user community can resist the temptation to 'roll their own' anyway. This has definitely worked with GMP, in part because building GMP on your own is sometimes incredibly painful, especially on windows, but also because there isn't much point in building your own anymore and your own products can ship a generic library that will work well on anything your users have.

So if we added this to the next version of GMP-ECM and the version that shipped with Ubuntu (or whatever) suddenly gets 95% of the performance of figuring out your own perfect set of compile options, would that be good enough for you?

axn 2012-02-11 14:34

So I get 10% faster stage 1, and 0% faster stage 2, and a potential 5% performance loss overall ( due to not-quite-perfect compile options ), I should still come out ahead, right (assuming 3:1 stage1/stage2 split)? What am I missing? :unsure:

xilman 2012-02-11 15:25

[QUOTE=axn;289040]So I get 10% faster stage 1, and 0% faster stage 2, and a potential 5% performance loss overall ( due to not-quite-perfect compile options ), I should still come out ahead, right (assuming 3:1 stage1/stage2 split)? What am I missing? :unsure:[/QUOTE]An encouragement to build your own binary which is optimum for your own machine and/or your own interests?

I've always found it extremely easy to build ECM but, there again, I try not to use Windoze.

ATH 2012-02-11 17:16

This is another option for "windoze" users: [URL=""][/URL]

It's only tested by 2 people that I know of, but it seems to work.

yoyo 2012-02-11 20:36

I really would be happy to get a fat binary. I send GMP-ECM to all my Boinc users which have all kind of systems.

debrouxl 2012-02-11 20:53

Well, if the performance penalty of using this (currently hypothetical) fat GMP-ECM binary is really 5%, compared to the performance of a CPU family-specific binary, large scale ECM users (bdodson et al.) are unlikely to use a fat binary, aren't they ?

Should the task of a fat GMP-ECM binary be undertaken, I guess that all of us could happily contribute CPU power to further tuning (if necessary, of course), on various CPU families :smile:

chris2be8 2012-02-12 17:20

Am I correct in assuming that the fat binary would decide which CPU is was running on at startup, so the overhead would be very small (milliseconds at most)? It would presumably be a lot larger on disk but only the code for the current CPU would be loaded so the memory overhead would be small.

If that's true I'm all for it. And if it can chose the best code path for the CPU, target size, B1, B2, etc if could easily run faster than the thin binary.

The only case I think of for the thin binary is if disk space is limited, eg diskless systems with just a little flash memory to boot off.

Chris K

xilman 2012-02-12 21:19

I just voted. Took me so long because I [b]am[/b] "my guy" and I didn't know which of the two to vote for.

jasonp 2012-02-12 22:25

Large-scale ECM users always need special considerations, since 5% of the time of thousands of machines can safely be called valuable.

Usually a fat library has a hook into the application startup process that identifies the CPU and instruction set only once, then fills up a global table with pointers to the functions chosen for that run. For the rest of runtime the only difference then becomes that a function call becomes a table lookup instead of a jump to an address known at compile time.

All the headaches with this are in the build process, not at runtime. Msieve has a fat binary architecture when running QS, and it does that by taking the same source file and compiling it over and over again with different arguments, plus different names for the top-level functions in each architecture. Doing the same for GMP-ECM means lots of contortions in automake, plus trying to make the Visual Studio builds do the same thing, which may not be possible.

All times are UTC. The time now is 12:18.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.