mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > GMP-ECM

Reply
 
Thread Tools
Old 2022-10-19, 11:00   #1
WraithX
 
WraithX's Avatar
 
Mar 2006

10268 Posts
Default Who can implement ECM Stage 2 on a GPU?

I'd like to ask the forum for help in finding someone to implement ECM Stage 2 on a GPU.

Can anyone here on the forum do this?

Can we reach out to a mathematician or software engineer who can?

Can we reach out to a grad student or professor who can?

Is it possible to hire someone to write this code? What would be a reasonable offer for a request like this?

I know we have a lot of talent here on the forum, and a lot of connections and contacts with talented individuals outside of the forum. Hopefully we can find someone who is up to the task!

I've asked Paul Zimmermann, the creator and maintainer of GMP ECM, and he recommended I ask here on the forum.

He also mentioned that the state-of-the-art algorithms for stage 2 are described in "20 Years of ECM".

I'm very interested to see what can be accomplished and how well it will work compared to the CPU implementation.

Thanks in advance for any help you can give, either in the search for a developer or in creating this code!
WraithX is offline   Reply With Quote
Old 2022-10-19, 15:07   #2
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

5·23·71 Posts
Default

I've been working off-and-on to get state-of-the-art stage 2 into prime95. I can be a little obsessive about optimizing.

The 20 years of ECM paper can be a bit daunting. Prime95 is using a new trick that helps with ECM on big Mersennes - it may or may not help with tiny Cunningham numbers.

The problem boils down to writing fast polynomial multiplication code using schoolboy n^2 or FFT n*log(n) techniques. You'll start with a large number of length-1 monic polynomials, multiply pairs to get length-2 monics, multiply pairs to get length-4, etc. Once you have one big poly, make a second and third big poly. These are multiplied together modulo the first poly (which is three more poly multiplications). Finally, the big poly is reduced back to a lot of length-1 monic polys (see a paper by Daniel J Bernstein -- https://cr.yp.to/arith/scaledmod-20040820.pdf). You guessed it, this reduction is more poly multiplications. Finally you take a GCD.
Prime95 is offline   Reply With Quote
Old 2022-10-19, 19:35   #3
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

479 Posts
Default

Last year I re-wrote ecm's stage 1 GPU code. This was mostly a much simpler effort; taking existing working code and slowly replacing the hand coded grad-school o(n^2) multiplication with CGBN (see Faster GPU-ECM with CGBN). The resulting stage 1 code is quite compact about 100 lines of code (See cgbn_stage1.cu#L229-L330). Obviously it took a lot of frustrating, debugging, and re-checking of the references to implement this.

When I started it had already been proved that stage 1 on the GPU was possible (even competitive with CPU) so I had a strong sense this would work (I benchmarked multiplications per second and CGBN was 25x faster).

What sized inputs are you most interested in? If it's <8,000 bit numbers then utilizing CGBN you may avoid having to re-write gwnum or GMP. If you want arbitrary sized inputs then this is a bigger ask.

A quick google search didn't turn up any existing GPU stage 2 implementations, have you seen any existing implementations?
Are there any simple reference implementations for stage 2?

I'd personally be interested in collaborating on a simple implementation of stage 2, but I doubt it would be remotely competitive with an optimized (and algorithmically advantaged) CPU implementation.
SethTro is offline   Reply With Quote
Old 2022-10-19, 20:45   #4
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

47910 Posts
Default

I found two papers referencing GPU based stage 2.

https://eprint.iacr.org/2014/397.pdf - Cofactorization on Graphics Processing Units

https://eprint.iacr.org/2020/1265.pdf - Revisiting ECM on GPUs

Quote:
ECM Stage 2 on GPUs. The fastest known methods to implement stage 2 of ECM are
FFT-based [12, 36, 37] and rather memory-hungry, which may explain why earlier constrained
device ECM-cofactoring work did not consider stage 2. These methods are also incompatible
with the memory restrictions of current GPUs.

These papers both target small numbers (<512 bits) and small bounds (B2=50000). Both seem to adapt the baby-step giant-step approach. It's not clear (to me) what memory requirements would be needed for the parallel computation of X,000 curves with large stage 2 bounds.
SethTro is offline   Reply With Quote
Old 2022-10-19, 22:07   #5
WraithX
 
WraithX's Avatar
 
Mar 2006

53410 Posts
Default

George, thank you for the link to the Bernstein paper! I can see that being a huge help to this project.

Seth, I am thinking of running Stage 2 on numbers with fewer than 8000 bits. My range of interest would probably be around 1000 bits or fewer.

I wasn't envisioning using GPU Stage 2 on multiple curves at once. I was thinking of using the GPU to speed up stage 2 of a single curve. (though I wouldn't refuse an implementation that could do multiple curves at once!) I know some B2 bounds can lead to very large memory requirements in the cpu implementation, which is why I was thinking of working on a single curve at a time. Though there is also the "-k" option in GMP-ECM which breaks the calculation into smaller chunks which use less memory, so we could investigate implementing that option, too.

Perhaps we could start a collaboration and create the code here and run ideas past each other to make sure we're on the right track? I think starting with the Bernstein paper and having a good implementation of that (perhaps in CGBN) might be a good place to start. How does this sound?
WraithX is offline   Reply With Quote
Old 2022-10-19, 22:42   #6
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

479 Posts
Default

Quote:
Originally Posted by WraithX View Post
George, thank you for the link to the Bernstein paper! I can see that being a huge help to this project.

Seth, I am thinking of running Stage 2 on numbers with fewer than 8000 bits. My range of interest would probably be around 1000 bits or fewer.

I wasn't envisioning using GPU Stage 2 on multiple curves at once. I was thinking of using the GPU to speed up stage 2 of a single curve. (though I wouldn't refuse an implementation that could do multiple curves at once!) I know some B2 bounds can lead to very large memory requirements in the cpu implementation, which is why I was thinking of working on a single curve at a time. Though there is also the "-k" option in GMP-ECM which breaks the calculation into smaller chunks which use less memory, so we could investigate implementing that option, too.

Perhaps we could start a collaboration and create the code here and run ideas past each other to make sure we're on the right track? I think starting with the Bernstein paper and having a good implementation of that (perhaps in CGBN) might be a good place to start. How does this sound?
I enjoy developing code and doing it with others is even better; Feel free to email me if that a better flow too.

I'll read Scaled Remainder Trees this week with an eye on adapting it for the GPU.

Last fiddled with by SethTro on 2022-10-19 at 22:42
SethTro is offline   Reply With Quote
Old 2022-10-22, 18:45   #7
WraithX
 
WraithX's Avatar
 
Mar 2006

2·3·89 Posts
Default

I realized that we probably don't need to build everything from the ground up. We can probably look at porting the existing gmp-ecm code over to a gpu framework. We can discuss details over email.

I'm sure we'll post updates here, but if anyone else is interested in contributing to this project please let us know!
WraithX is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to use prime95 for stage 1 & GMP-ECM for stage 2 Prime95 Lone Mersenne Hunters 118 2022-07-04 18:19
trying to implement block lanczos on GF2... paul0 Math 8 2020-10-08 04:09
Stage 1 with mprime/prime95, stage 2 with GMP-ECM D. B. Staple Factoring 2 2007-12-14 00:21
Need help to run stage 1 and stage 2 separately jasong GMP-ECM 9 2007-10-25 22:32
Stage 1 and stage 2 tests missing Matthias C. Noc PrimeNet 5 2004-08-25 15:42

All times are UTC. The time now is 04:07.


Tue Feb 7 04:07:42 UTC 2023 up 173 days, 1:36, 1 user, load averages: 0.98, 1.05, 1.05

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔