mersenneforum.org ECM for CUDA GPUs in latest GMP-ECM ?
 Register FAQ Search Today's Posts Mark Forums Read

 2012-01-27, 21:27 #1 Karl M Johnson     Mar 2010 3×137 Posts ECM for CUDA GPUs in latest GMP-ECM ? Greetings. While working on gmp-ecm, I discovered a "gpu" folder inside latest svn. Can somebody shed some light on it? Is it a fully-working gmp-ecm implementation ? Or partially gpu-accelerated ? Any info will do!
2012-01-27, 21:47   #2
xilman
Bamboozled!

May 2003
Down not across

22·41·61 Posts

Quote:
 Originally Posted by Karl M Johnson Greetings. While working on gmp-ecm, I discovered a "gpu" folder inside latest svn. Can somebody shed some light on it? Is it a fully-working gmp-ecm implementation ? Or partially gpu-accelerated ? Any info will do!
I'm playing with it. Not yet found a new factor with it. Current activity is to work out what's going on and then to see whether I can contribute to the effort. So far I've found areas where I may be able to help Cyril with development.

So far, it is Stage 1 only (as one should expect) and each bit in the product of prime powers up to B1 requires a separate kernel call. It would be easy enough to make the entire sequence of elliptic curve arithmetic operations a single kernel but not obviously a good idea (think about it).

The code currently uses fixed kilobit arithmetic and so is limited to factoring integers somewhat under that size. One of the areas I may be able to help is to add flexibility in that regard. Another is improve the underlying arithmetic primitives. A third is to reduce the (presently extortionate IMO) amount of cpu time used by busy-waiting for the kernels to complete.

Paul

 2012-01-28, 06:55 #3 Karl M Johnson     Mar 2010 3×137 Posts Oh, alright. It cant substitute gmp-ecm yet. Stage2 on GPUs is not very possible(except Teslas) because it requires a lot of RAM ?
2012-02-10, 15:35   #4
xilman
Bamboozled!

May 2003
Down not across

22·41·61 Posts

Quote:
 Originally Posted by Karl M Johnson Oh, alright. It cant substitute gmp-ecm yet.
Actually it can. I just found my first factor with gpu-ecm
Code:
Resuming ECM residue saved by pcl@anubis.home.brnikat.com with GPU-ECM 0.1
Input number is 27199999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 (275 digits)
Using B1=110000000-0, B2=110000000-776278396540, polynomial Dickson(30), A=112777948379516601562499999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999998
Step 1 took 0ms
Step 2 took 176954ms
********** Factor found in step 2: 7219650603145651593481420276356225303436099
Found probable prime factor of 43 digits: 7219650603145651593481420276356225303436099
Composite cofactor 3767495339476507669490528975999036413199084363632138583182977316467718682572689799651214383863597597569613063674073098348731470273762390945944125050829513627232967008803673015815350396060271580465332444072090424114938478394067646101 has 232 digits
A total of 1792 curves were run at B1=110M and the factor found on the 1349th second stage. I'm running the remainder in case another factor can be found.

Each stage one took 70 seconds on a GTX-460. The latest ECM takes 679 seconds per stage 1 on a single core of a 1090T clocked at 3264MHz, so the GPU version is close to 10 times faster in this situation.

Paul

Last fiddled with by xilman on 2012-02-10 at 16:09

 2012-02-10, 15:46 #5 Brain     Dec 2009 Peine, Germany 1010010112 Posts GPU ECM 0.1 Could anybody provide a (link to a) (Windows 64) binary for GPU-ECM 0.1?
2012-02-10, 16:08   #6
xilman
Bamboozled!

May 2003
Down not across

22×41×61 Posts

Quote:
 Originally Posted by xilman Actually it can. I just found my first factor with gpu-ecm
Surprise is no longer adequate, and I'm forced to resort to astonishment
Code:
Using B1=110000000-0, B2=110000000-776278396540, polynomial Dickson(30), A=113233923912048339843749999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999998
Step 1 took 0ms
Step 2 took 176408ms
********** Factor found in step 2: 2315784840580190375316972295830305082761
Found probable prime factor of 40 digits: 2315784840580190375316972295830305082761
Composite cofactor 11745478044145667127387304121681137574169838448099828222193156280305822325912582394902125137121744003039884215185197839759773264126893037178852537820600462333360764644042755609744438099938488940900666235577910910744169710591419476281159 has 236 digits
took another 18 curves.

If I was unlucky not to find the p43 earlier than this, given the amount of ECM work performed, I was especially unlucky not to find the p40. The candidate number, GW(10,272) had no previously known factors despite having had a complete t40 run through the ECMNET server here and clients around the world.

The cofactor is now c193 so it's worth seeing whether the remaining curves will find anything.

Paul

2012-02-10, 16:23   #7
xilman
Bamboozled!

May 2003
Down not across

1000410 Posts

Quote:
 Originally Posted by Brain Could anybody provide a (link to a) (Windows 64) binary for GPU-ECM 0.1?
Not me.

2012-02-10, 16:25   #8
pinhodecarlos

"Carlos Pinho"
Oct 2011
Milton Keynes, UK

3·1,523 Posts

Quote:
 Originally Posted by xilman Not me.
And for linux?

2012-02-10, 16:56   #9
xilman
Bamboozled!

May 2003
Down not across

271416 Posts

Quote:
 Originally Posted by pinhodecarlos And for linux?
If you have Linux you can build from the SVN sources as easily as I can.

The process really is very straightforward and you'll end up with something which doesn't carry the risk of the Linux equivalent of DLL-hell.

If you really are not lazy(*) enough to build your own, I could make available the binary I use. No guarantees that it will work, or even run, on any other Linux system. It almost certainly won't work optimally unless you have exactly the same environment as me.

Paul

* Sometimes it's much better to do some work ahead of time to remove the need to do much more work later. That's true laziness.

 2012-02-10, 21:30 #10 Karl M Johnson     Mar 2010 3·137 Posts Windows binary wanted
2012-02-11, 00:07   #11
R.D. Silverman

Nov 2003

163738 Posts

Quote:
 Originally Posted by xilman Actually it can. I just found my first factor with gpu-ecm Code: Resuming ECM residue saved by pcl@anubis.home.brnikat.com with GPU-ECM 0.1 Input number is 27199999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 (275 digits) Using B1=110000000-0, B2=110000000-776278396540, polynomial Dickson(30), A=112777948379516601562499999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999998 Step 1 took 0ms Step 2 took 176954ms ********** Factor found in step 2: 7219650603145651593481420276356225303436099 Found probable prime factor of 43 digits: 7219650603145651593481420276356225303436099 Composite cofactor 3767495339476507669490528975999036413199084363632138583182977316467718682572689799651214383863597597569613063674073098348731470273762390945944125050829513627232967008803673015815350396060271580465332444072090424114938478394067646101 has 232 digits A total of 1792 curves were run at B1=110M and the factor found on the 1349th second stage. I'm running the remainder in case another factor can be found. Each stage one took 70 seconds on a GTX-460. The latest ECM takes 679 seconds per stage 1 on a single core of a 1090T clocked at 3264MHz, so the GPU version is close to 10 times faster in this situation. Paul
Awesome.

Is the code specific to a particular GPU? How portable is it?

 Similar Threads Thread Thread Starter Forum Replies Last Post Rodrigo GPU Computing 3 2016-05-17 05:43 ATH GMP-ECM 10 2012-07-29 17:15 ATH GMP-ECM 7 2012-01-07 18:34 davieddy Lounge 0 2011-01-21 19:29 [CZ]Pegas Software 3 2002-08-23 17:05

All times are UTC. The time now is 20:05.

Fri Jun 5 20:05:24 UTC 2020 up 72 days, 17:38, 1 user, load averages: 2.26, 1.74, 1.60