mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   llrCUDA (https://www.mersenneforum.org/showthread.php?t=14608)

msft 2011-02-17 03:10

llrcuda.0.48$ time ./llrCUDA -d -q3*2^2312734-1
Starting Lucas Lehmer Riesel prime test of 3*2^2312734-1
Using real irrational base DWT, FFT length = 131072
V1 = 9 ; Computing U0...done.
3*2^2312734-1, iteration : 10000 / 2312734 [0.43%]. Time per iteration : 0.792
3*2^2312734-1 is prime! Time : 1815.005 sec.

real 30m21.324s
user 22m56.700s
sys 6m57.070s

msft 2011-02-17 08:46

1 Attachment(s)
Use gwpnum/giants.[ch].

3*2^414840-1 is prime! Time : 138.928 sec.
3*2^382449+1 is prime! Time : 141.989 sec.

ltd 2011-02-17 17:15

I am at a customer at the moment till sunday. So there will be no new windows test version till Sunday evening.

mdettweiler 2011-02-17 19:24

As expected, my second test with 0.48 on another of the PSP numbers also segfaulted:
[code]
gary@herford:~/Desktop/gpu-stuff/llrcuda$ ./llrCUDA -d -q237019*2^6100630+1
Starting Proth prime test of 237019*2^6100630+1
Using complex irrational base DWT, FFT length = 2097152, a = 3
Segmentation fault
[/code]
I'll give it another go with 0.49.

Edit: Wow! :shock: I started a test with 0.49 on 237019*2^6100018+1 about half an hour ago and it's currently at 3.60% already--with times of 5.052 ms/iter.! That's about twice the speed of 0.48, which IIRC was in the vicinity of 10 ms/iter.

Ken_g6 2011-02-18 01:26

[QUOTE=msft;252656][CODE]
llr:
loop:
mul_two_to_phi
fft
mul
fft
mul_two_to_minusphi
normalize
MacLucasFFTW:
loop:
fft
mul
fft
mul_two_to_phi
mul_two_to_minusphi
normalize
[/CODE]
Place of mul_two_to_phi was cause of poor performance.[/QUOTE]
Would this apply to GeneferCUDA as well? :w00t:

msft 2011-02-18 03:40

[QUOTE=Ken_g6;252862]Would this apply to GeneferCUDA as well? :w00t:[/QUOTE]
Genfer have only 1 loop,very efficient.

mdettweiler 2011-02-18 04:09

This just in:
[code]
gary@herford:~/Desktop/gpu-stuff/llrcuda$ ./llrCUDA -d -q237019*2^6100018+1
Starting Proth prime test of 237019*2^6100018+1
Using complex irrational base DWT, FFT length = 1048576, a = 3
Segmentation fault
[/code]
So it looks like 0.49 still segfaults on numbers this large. But on the plus side, it went a lot faster! :big grin: (Since it just printed "Segmentation fault" at the end instead of an actual result, I dont have an overall timing figure...I should probably run these inside the "time" command in the future to cover that contingency. But the iteration timings, as I mentioned before, were about half what they were with 0.48.)

I'll try some other large numbers (not quite this large though) to see if I can get a better idea of what the upper limit is. Edit: and I'll also try a similarly large LLR test to see if it has the same problem.

mdettweiler 2011-02-18 19:38

An LLR test of slightly smaller n and rather smaller k (i.e., next lower FFT length) also segfaulted at the end:
[code]
gary@herford:~/Desktop/gpu-stuff/llrcuda$ time ./llrCUDA -d -q3*2^6090515-1
Starting Lucas Lehmer Riesel prime test of 3*2^6090515-1
Using real irrational base DWT, FFT length = 524288
V1 = 3 ; Computing U0...done.

Segmentation fault

real 245m0.494s
user 106m2.130s
sys 88m35.430s
[/code]
So it appears that the segfaults on large numbers are NOT due to the FFT (I've successfully done FFT=524288 numbers with llrCUDA before). It might be the n, or more precisely the number of bits the number has (which for base 2 is very close to n).

msft 2011-02-18 22:23

The giants bug not fix with 049.

llrcuda.0.49$ time ./llrCUDA -d -q27653*2^9167433+1

27653*2^9167433+1 is prime! Time : 102455.008 sec.

real 1707m59.834s
user 726m25.230s
sys 954m28.110s

mdettweiler 2011-02-19 00:03

[QUOTE=msft;252974]The giants bug not fix with 049.

llrcuda.0.49$ time ./llrCUDA -d -q27653*2^9167433+1

27653*2^9167433+1 is prime! Time : 102455.008 sec.

real 1707m59.834s
user 726m25.230s
sys 954m28.110s[/QUOTE]
Whoa...how'd you manage that one? :huh: Did you change something from the standard 0.49 so that it wouldn't segfault on a number that big?

mdettweiler 2011-02-19 01:48

It would appear n=~5.08M is still too big for 0.49 (at least the stock version) to handle:
[code]
gary@herford:~/Desktop/gpu-stuff/llrcuda$ time ./llrCUDA -d -q3*2^5082306+1
Starting Proth prime test of 3*2^5082306+1
Using complex irrational base DWT, FFT length = 524288, a = 5
Segmentation fault

real 207m14.769s
user 96m32.650s
sys 64m29.190s
[/code]


All times are UTC. The time now is 13:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.