mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   llrCUDA (https://www.mersenneforum.org/showthread.php?t=14608)

msft 2011-01-08 18:02

llrCUDA
 
1 Attachment(s)
llrpisrc.zip convert to CUDA.:devil:
support k*2^n+1 & prime only.
[QUOTE]
5*2^23473+1 is prime! Time : 7.933 sec.
11*2^18759+1 is prime! Time : 6.200 sec.
99*2^83863+1 is prime! Time : 32.564 sec.
21*2^94801+1 is prime! Time : 37.656 sec.
39*2^113549+1 is prime! Time : 62.529 sec.
[/QUOTE]

Mini-Geek 2011-01-08 18:22

[QUOTE=msft;245150]llrpisrc.zip convert to CUDA.:devil:
support k*2^n+1 & prime only.[/QUOTE]

Looks like it's significantly slower than a CPU right now:[CODE]99*2^83863+1 is prime! Time : 7.407 sec.[/CODE]If its speed can be improved and at least the prime-only limitation removed, this could be a huge thing for projects like NPLB!

em99010pepe 2011-01-08 18:26

Corei5 750@3.6 GHz with llr3.8.4 version.

[code]5*2^23473+1 is prime! Time : 332.759 ms.
11*2^18759+1 is prime! Time : 167.324 ms.
99*2^83863+1 is prime! Time : 4.739 sec.
21*2^94801+1 is prime! Time : 5.356 sec.
39*2^113549+1 is prime! Time : 7.782 sec.[/code]

mdettweiler 2011-01-08 19:11

:party:

Awesome! :big grin:

Gary and I are still working to get his GPU functioning again...we've been running into some strange issues with driver config on Ubuntu 10.04, but will hopefully be able to get it working soon. As soon as we do, I'm open to help with any testing that's needed.

em99010pepe 2011-01-08 19:32

msft,

Please make a test with bigger numbers:

5*2^1282755+1
5*2^1320487+1

Ralf Recker 2011-01-08 22:23

[QUOTE=em99010pepe;245157]msft,

Please make a test with bigger numbers:

5*2^1282755+1
5*2^1320487+1[/QUOTE]

Just started the first one on my lowly GTX 460 (MSI factory overclocked @725 MHz, 64 Bit Linux, compiled with the CUDA 3.1 toolkit, Driver version 256.53) and on a Core of a Q9550 @ 3.6 GHz... results follow...

First impression:

[COLOR=Lime]ralf@quadriga[/COLOR] [COLOR=Blue]~/llrcuda.0.07 $[/COLOR] time ./llrCUDA -q"5*2^1282755+1" -d
Starting Proth prime test of 5*2^1282755+1, FFTLEN = 131072 ; a = 3
5*2^1282755+1, bit: 20000 / 1282757 [1.55%]. Time per bit: 2.113 ms.

Quick comparison: Time per bit on the CPU: ~0.812 ms.

CPU Result (LLR 3.8.4):

[COLOR=Lime]ralf@quadriga[/COLOR] [COLOR=Blue]~ $[/COLOR] time sllr -q"5*2^1282755+1" -d
Resuming Proth prime test of 5*2^1282755+1 at bit 20876 [1.62%]


5*2^1282755+1 is prime! Time : 1041.208 sec.

real 17m4.170s
user 17m2.276s
sys 0m1.640s

I've accidently interrupted the CPU run. The first 1.62% took:

real 0m17.266s
user 0m17.113s
sys 0m0.028s

so you need to add ca. 17 seconds to the 1041 seconds above...

Ralf Recker 2011-01-08 23:06

OK. Here is the result from the GPU (details above):

[COLOR=Lime]ralf@quadriga[/COLOR] [COLOR=Blue]~/llrcuda.0.07 $[/COLOR] time ./llrCUDA -q"5*2^1282755+1" -d
Starting Proth prime test of 5*2^1282755+1, FFTLEN = 131072 ; a = 3
5*2^1282755+1 is prime! Time : 2708.763 sec.

real 45m8.793s
user 45m2.749s
sys 0m5.644s

Edit: A version compiled with --arch=sm_21 is slower (2.155 ms per bit), a version compiled with --arch=sm_20 is a tiny bit faster (2.085 ms per bit).

em99010pepe 2011-01-08 23:40

Ralf Recker,

First of all thank you. Second, can you post the specs of your machine (memory, harddrives, DVD-R, etc)? I want to make some calculations about energy efficiency so I need to know how many and type of components you have on it to make an energy consumption estimate.

msft 2011-01-09 00:34

GTX460:
5*2^1282755+1 is prime! Time : 4491.564 sec.
5*2^1320487+1 is prime! Time : 4447.951 sec.

msft 2011-01-09 02:17

1 Attachment(s)
Fix abort with non prime.
[QUOTE]
5*2^23471+1 is not prime. Proth RES64: FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFE Time : 8.686 sec.
5*2^23473+1 is prime! Time : 8.206 sec.
[/QUOTE]
RE64 value is llrpisrc.zip original bug(with 64linux).somebody can fix ?

Jean Penné 2011-01-09 09:49

Very interesting work!
 
Hi,

First, Best wishes to you for an happy new year, and many congrats for
this work! Indeed, I am very interested in your attempts, although I have
presently neither hardware nor software to develop my code with CUDA...

However, I am now working on a new version of llrpi, which is no more
limited to IBDWT and small k's : it works with zero-padded FFT for k's
from 22 to 45 bits large, and generic modular reduction for larger k's.

Moreover, the portable "gwpnum" code is written as a library, like the
George Woltman's "gwnum" one.

It seems to work fine for k*2^n+1 and k*2^-1 numbers (and using generic
reduction for more general ones), so, I shall release the new source shortly.

Best Regards,
Jean


All times are UTC. The time now is 05:01.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.