mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2011-11-28, 14:40   #1
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

5×7×139 Posts
Default CUDA algorithms

ECM on graphics cards (Bernstein - Chen - Cheng - Lange - Yang)
GMP implementations on CUDA (Liu - Tong)
Multi-precision modular multiplication on GPU (Zhao)
The billion-Mulmod-per-second PC (Bernstein - Chen - Chen - Cheng - Hsiao - Lange - in - Yang)

If you can provide more links, maybe this thread is worth a "sticky" bit.

Luigi

Last fiddled with by ET_ on 2011-11-28 at 15:12
ET_ is offline   Reply With Quote
Old 2011-11-28, 15:22   #2
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

1CAD16 Posts
Default

How about OpenCL algorithms?
rogue is offline   Reply With Quote
Old 2011-11-29, 01:03   #3
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

5·23·31 Posts
Default

For multiprecision arithmetic? The stuff you see here in the trial factoring on GPU section could well be all there is now.

Last fiddled with by jasonp on 2011-11-29 at 01:04
jasonp is offline   Reply With Quote
Old 2011-11-29, 01:47   #4
Christenson
 
Christenson's Avatar
 
Dec 2010
Monticello

5·359 Posts
Default Call for CUDA P-1 code

I make no claim to have this, but it seems to me that *IF* P-1 *could* benefit from GPU processing, that would be a significant benefit to GIMPS. I'd love to program it, but don't know if I can do it in a timely fashion.

I conjecture that a GPU implementation would also allow limits to be raised on numbers checked for Goldbach's conjecture by a few (3-4) orders of magnitude.

Finally, if ECM is efficient on GPUs, then it's useful bounds might well extend into the current DC range. Emotionally, I prefer factors to LL checks!

What about sieving for SNFS and GNFS (or even MPQS) on GPUs? I think that already exists....
Christenson is offline   Reply With Quote
Old 2011-11-29, 10:02   #5
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

486510 Posts
Default

Quote:
Originally Posted by Christenson View Post
I make no claim to have this, but it seems to me that *IF* P-1 *could* benefit from GPU processing, that would be a significant benefit to GIMPS. I'd love to program it, but don't know if I can do it in a timely fashion.
You are doing something big with mfaktc, please go ahead...
I'm using my spare time to study how to theorically work out something, and hope I'll have some guidelines by the time you'll release a Primenet enabled mfaktc...

Quote:
Originally Posted by Christenson View Post
I conjecture that a GPU implementation would also allow limits to be raised on numbers checked for Goldbach's conjecture by a few (3-4) orders of magnitude.
Why not Riemann Zeta function?


Quote:
Originally Posted by Christenson View Post
Finally, if ECM is efficient on GPUs, then it's useful bounds might well extend into the current DC range. Emotionally, I prefer factors to LL checks!
I think the main limitation for good ECM on GPUs is the memory constraint.
If (as it seems) EECM is less memory-hungry and more efficient in searching factors, then we may try to contact Bernstein's workgroup.

I'd like to know what Alex Kruppa thinks about it.

Quote:
Originally Posted by Christenson View Post
What about sieving for SNFS and GNFS (or even MPQS) on GPUs? I think that already exists....
Actually msieve uses GPU for polynomial selection. AFAIK, sieving on GFNS/SNFS implies heavy memory transfers, not a good characteristic for GPUs.

Luigi
ET_ is offline   Reply With Quote
Old 2011-11-29, 10:19   #6
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

41·251 Posts
Default

Assuming one has a Fermi (or two Fermi's ) with 3GB (each) or 6Gb (each, for Tesla's) and 384 bit bus for internal mem access (each). How would this influence the speed of P-1 and/or ECM? I would be willing to try, if any software available. No time and no knowledge to try to write my own. GPU's now are very different from the GPU's at the time when that papers were written. I don't believe the mem size or the mem access is a limitation anymore. When I run CudaLucas, the GPU's mem controller is anywhere between 30% and 40% busy, for a 95% GPU occupation, and when I run mfaktc (need 3 copies to max the GPU, i.e. to raise it to 95-98%) the mem controller is 1% busy. No joke. 1%! Here also there is a lot of place for improvement, I think. Like for example, use GPU for sieving too, and let the CPU free for other tasks. Contrary to CudaLucas, which does not need the CPU at all (almost) mfaktc is taking lots of CPU resources for sieving or whatever else, and meantime, GPU's mem is "wasted", sleeping, not used at all.

Last fiddled with by LaurV on 2011-11-29 at 10:28
LaurV is offline   Reply With Quote
Old 2011-11-29, 10:43   #7
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

5·7·139 Posts
Default

You put different ideas on the same message.

Quote:
Originally Posted by LaurV View Post
Assuming one has a Fermi (or two Fermi's ) with 3GB (each) or 6Gb (each, for Tesla's) and 384 bit bus for internal mem access (each). How would this influence the speed of P-1 and/or ECM?
No easy Brent-Suyama extension.
As for the speed, it all depends on both the algorithm and the level of parallelization.
I.E. Karatsuba multiplication has a better asymptotic efficiency than Montgomery multiplication, but the latter is much easier to parallelize.
FFT multiplication is asymptotically the fastest, but if you only rely on base 2 FFTs its benefical effects dim out.

Quote:
Originally Posted by LaurV View Post
I would be willing to try, if any software available. No time and no knowledge to try to write my own. GPU's now are very different from the GPU's at the time when that papers were written. I don't believe the mem size or the mem access is a limitation anymore.
It is.
You may have 6 GB memory, but threads that are allowed to use only 64K at a time. To feed the threads, much memory management should be done. Or use of global memory, with a slowdown of about 600x on accesses.

Quote:
Originally Posted by LaurV View Post
When I run CudaLucas, the GPU's mem controller is anywhere between 30% and 40% busy, for a 95% GPU occupation, and when I run mfaktc (need 3 copies to max the GPU, i.e. to raise it to 95-98%) the mem controller is 1% busy. No joke. 1%! Here also there is a lot of place for improvement, I think. Like for example, use GPU for sieving too, and let the CPU free for other tasks. Contrary to CudaLucas, which does not need the CPU at all (almost) mfaktc is taking lots of CPU resources for sieving or whatever else, and meantime, GPU's mem is "wasted", sleeping, not used at all.
Note that there are tasks that are better completed by CPUs, and other that are suited for GPUs.

Luigi

Last fiddled with by ET_ on 2011-11-29 at 10:43
ET_ is offline   Reply With Quote
Old 2011-11-29, 10:56   #8
axn
 
axn's Avatar
 
Jun 2003

125308 Posts
Default

Optimization of Primality Testing Methods by GPU
axn is offline   Reply With Quote
Old 2011-11-29, 10:59   #9
axn
 
axn's Avatar
 
Jun 2003

23×683 Posts
Default

Quote:
Originally Posted by LaurV View Post
Assuming one has a Fermi (or two Fermi's ) with 3GB (each) or 6Gb (each, for Tesla's) and 384 bit bus for internal mem access (each).
Are you talking in GIMPS context or generic P-1/ECM context (a la GMP-ECM)? The answers would differ based on the context.
axn is offline   Reply With Quote
Old 2011-11-29, 12:10   #10
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

5×7×139 Posts
Default

Quote:
Originally Posted by axn View Post
Thanks!


Last fiddled with by ET_ on 2011-11-29 at 12:10
ET_ is offline   Reply With Quote
Old 2011-11-29, 17:13   #11
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

5×7×139 Posts
Default

Another one:

Modular Reduction of Large Integers Using Classical, Barrett, Montgomery Algorithms

Not specifically CUDA-aware, but interesting, since TheJudger used Barrett algorithm for mfaktc...

Luigi
ET_ is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
AGM algorithms for ln(x) ewmayer Other Mathematical Topics 17 2018-09-12 16:06
algorithms for special factorizations jjcale Factoring 6 2011-07-28 02:06
Quantum Algorithms? ShiningArcanine Lounge 4 2007-12-16 12:11
Implementing algorithms, did I do this right? ShiningArcanine Programming 18 2005-12-29 21:47
do you think there are better algorithms to be discovered? ixfd64 Lounge 5 2004-03-29 06:07

All times are UTC. The time now is 15:09.


Fri Jul 7 15:09:25 UTC 2023 up 323 days, 12:37, 0 users, load averages: 1.08, 1.14, 1.14

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔