![]() |
|
|
#1 |
|
Banned
"Luigi"
Aug 2002
Team Italia
5×7×139 Posts |
ECM on graphics cards (Bernstein - Chen - Cheng - Lange - Yang)
GMP implementations on CUDA (Liu - Tong) Multi-precision modular multiplication on GPU (Zhao) The billion-Mulmod-per-second PC (Bernstein - Chen - Chen - Cheng - Hsiao - Lange - in - Yang) If you can provide more links, maybe this thread is worth a "sticky" bit. ![]() Luigi Last fiddled with by ET_ on 2011-11-28 at 15:12 |
|
|
|
|
|
#2 |
|
"Mark"
Apr 2003
Between here and the
1CAD16 Posts |
How about OpenCL algorithms?
|
|
|
|
|
|
#3 |
|
Tribal Bullet
Oct 2004
5·23·31 Posts |
For multiprecision arithmetic? The stuff you see here in the trial factoring on GPU section could well be all there is now.
Last fiddled with by jasonp on 2011-11-29 at 01:04 |
|
|
|
|
|
#4 |
|
Dec 2010
Monticello
5·359 Posts |
I make no claim to have this, but it seems to me that *IF* P-1 *could* benefit from GPU processing, that would be a significant benefit to GIMPS. I'd love to program it, but don't know if I can do it in a timely fashion.
I conjecture that a GPU implementation would also allow limits to be raised on numbers checked for Goldbach's conjecture by a few (3-4) orders of magnitude. Finally, if ECM is efficient on GPUs, then it's useful bounds might well extend into the current DC range. Emotionally, I prefer factors to LL checks! What about sieving for SNFS and GNFS (or even MPQS) on GPUs? I think that already exists.... |
|
|
|
|
|
#5 | ||||
|
Banned
"Luigi"
Aug 2002
Team Italia
486510 Posts |
Quote:
![]() I'm using my spare time to study how to theorically work out something, and hope I'll have some guidelines by the time you'll release a Primenet enabled mfaktc... Quote:
![]() Quote:
If (as it seems) EECM is less memory-hungry and more efficient in searching factors, then we may try to contact Bernstein's workgroup. I'd like to know what Alex Kruppa thinks about it. Quote:
Luigi |
||||
|
|
|
|
|
#6 |
|
Romulan Interpreter
"name field"
Jun 2011
Thailand
41·251 Posts |
Assuming one has a Fermi (or two Fermi's
) with 3GB (each) or 6Gb (each, for Tesla's) and 384 bit bus for internal mem access (each). How would this influence the speed of P-1 and/or ECM? I would be willing to try, if any software available. No time and no knowledge to try to write my own. GPU's now are very different from the GPU's at the time when that papers were written. I don't believe the mem size or the mem access is a limitation anymore. When I run CudaLucas, the GPU's mem controller is anywhere between 30% and 40% busy, for a 95% GPU occupation, and when I run mfaktc (need 3 copies to max the GPU, i.e. to raise it to 95-98%) the mem controller is 1% busy. No joke. 1%! Here also there is a lot of place for improvement, I think. Like for example, use GPU for sieving too, and let the CPU free for other tasks. Contrary to CudaLucas, which does not need the CPU at all (almost) mfaktc is taking lots of CPU resources for sieving or whatever else, and meantime, GPU's mem is "wasted", sleeping, not used at all.
Last fiddled with by LaurV on 2011-11-29 at 10:28 |
|
|
|
|
|
#7 | |||
|
Banned
"Luigi"
Aug 2002
Team Italia
5·7·139 Posts |
You put different ideas on the same message.
![]() Quote:
As for the speed, it all depends on both the algorithm and the level of parallelization. I.E. Karatsuba multiplication has a better asymptotic efficiency than Montgomery multiplication, but the latter is much easier to parallelize. FFT multiplication is asymptotically the fastest, but if you only rely on base 2 FFTs its benefical effects dim out. Quote:
![]() You may have 6 GB memory, but threads that are allowed to use only 64K at a time. To feed the threads, much memory management should be done. Or use of global memory, with a slowdown of about 600x on accesses. Quote:
![]() Luigi Last fiddled with by ET_ on 2011-11-29 at 10:43 |
|||
|
|
|
|
|
#8 |
|
Jun 2003
125308 Posts |
|
|
|
|
|
|
#9 |
|
Jun 2003
23×683 Posts |
|
|
|
|
|
|
#10 |
|
Banned
"Luigi"
Aug 2002
Team Italia
5×7×139 Posts |
Last fiddled with by ET_ on 2011-11-29 at 12:10 |
|
|
|
|
|
#11 |
|
Banned
"Luigi"
Aug 2002
Team Italia
5×7×139 Posts |
Another one:
Modular Reduction of Large Integers Using Classical, Barrett, Montgomery Algorithms Not specifically CUDA-aware, but interesting, since TheJudger used Barrett algorithm for mfaktc... ![]() Luigi |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| AGM algorithms for ln(x) | ewmayer | Other Mathematical Topics | 17 | 2018-09-12 16:06 |
| algorithms for special factorizations | jjcale | Factoring | 6 | 2011-07-28 02:06 |
| Quantum Algorithms? | ShiningArcanine | Lounge | 4 | 2007-12-16 12:11 |
| Implementing algorithms, did I do this right? | ShiningArcanine | Programming | 18 | 2005-12-29 21:47 |
| do you think there are better algorithms to be discovered? | ixfd64 | Lounge | 5 | 2004-03-29 06:07 |