mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   llrCUDA (https://www.mersenneforum.org/showthread.php?t=14608)

Honza 2011-02-15 15:39

More tests:
[CODE]llrcuda -q"3*2^164987-1" -d
too small Exponent...

llrcuda -q"39*2^113549-1" -d
too small Exponent...[/CODE]

GPU usage close to 100%, CPU core at 100% all the time (that's for latest Sandy Bridge i5-2500@3.3Ghz)
[CODE]Starting Lucas Lehmer Riesel prime test of 3*2^414840-1
Using real irrational base DWT, FFT length = 65536
V1 = 3 ; Computing U0...
V1 = 3 ; Computing U0...done.
Starting Lucas-Lehmer loop...
3*2^414840-1 is prime! Time : 112.834 sec.98.83%]. Time per iteration : 0.273 ms.[/CODE]

mdettweiler 2011-02-15 17:45

[QUOTE=Honza;252570]GPU usage close to 100%, CPU core at 100% all the time (that's for latest Sandy Bridge i5-2500@3.3Ghz)[/QUOTE]
That's interesting...in my tests (GTX 460, stock Q6600@2.4Ghz) I only ever get a small amount of CPU usage. This may be a difference between the Windows and Linux versions.

BTW, how do you check the GPU usage %? I looked in the "nvidia-settings" GUI program (this is on Linux, Ubuntu 10.04.1 to be exact) but didn't see anything for that, just a temperature meter.

ltd 2011-02-15 17:56

[QUOTE=mdettweiler;252504]Could you possibly post a few PSP first-pass residues and the amount of time it took to run the tests (and on what hardware)? Then I could run them on Gary's GPU and compare speeds (since I already have a working llrCUDA setup).[/QUOTE]

Here some results from the DC effort. So the residues are already verified.
Machine is a i7 920 2.67GHz with seven threads running.

[2010-12-24 19:44:47 WEST] Candidate: 237019*2^6100018+1 Program: llr.exe Residue: 34687837ED148D74 Time: 95109 seconds
[2010-12-25 21:38:49 WEST] Candidate: 237019*2^6100630+1 Program: llr.exe Residue: 381782D8C112D665 Time: 93141 seconds
[2010-12-27 00:15:56 WEST] Candidate: 237019*2^6101242+1 Program: llr.exe Residue: 1A0C5A3E8372AFC2 Time: 95784 seconds
[2010-12-28 08:59:07 WEST] Candidate: 237019*2^6102286+1 Program: llr.exe Residue: 4E2920067DCBCF57 Time: 91167 seconds

ltd 2011-02-15 18:01

[QUOTE=Honza;252569]I dared to try, Win 7 x64/GTX 580.

Starting Lucas Lehmer Riesel prime test of 1000065*2^390927-1
Using rational base DWT and generic reduction, FFT length = 65536
V1 = 5 ; Computing U0...
V1 = 5 ; Computing U0...done.
Starting Lucas-Lehmer loop...
1000065*2^390927-1 is prime! Time : 296.914 sec.99.76%]. Time per iteration : 0.758 ms.[/QUOTE]

Which driver version do you use?
At least we know now that the software can run on windows in principle.
Next step will be to find out why it uses that much CPU.
When msft publishes a new version I will make a new version also.

Honza 2011-02-15 18:22

[QUOTE=mdettweiler;252580]BTW, how do you check the GPU usage %? I looked in the "nvidia-settings" GUI program (this is on Linux, Ubuntu 10.04.1 to be exact) but didn't see anything for that, just a temperature meter.[/QUOTE]
Latest GPU-Z.

Honza 2011-02-15 18:27

[QUOTE=ltd;252586]Which driver version do you use?
At least we know now that the software can run on windows in principle.
Next step will be to find out why it uses that much CPU.
When msft publishes a new version I will make a new version also.[/QUOTE]
Driver version 263.06. I *think* pre-260 doesn't support GTX 5-series.

Yeah, CPU usage. For example tpsieve uses almost no CPU.

Ken_g6 2011-02-15 23:07

[QUOTE=Ken_g6;251846]I also looked, and CUDA flags are never set, so blocking sync is never enabled.[/QUOTE]High CPU usage might be fixed by changing this. However, GeneferCUDA does have this enabled (except in the most recent version) and has high CPU usage on my Linux machine with CUDA 3.1 anyway. :unsure:

Edit: Oh, and there's a known bug in the 260.* Linux drivers with blocking sync, but that code snippet I posted earlier might get around it.

msft 2011-02-16 08:05

[CODE]
llr:
loop:
mul_two_to_phi
fft
mul
fft
mul_two_to_minusphi
normalize
MacLucasFFTW:
loop:
fft
mul
fft
mul_two_to_phi
mul_two_to_minusphi
normalize
[/CODE]
Place of mul_two_to_phi was cause of poor performance.

msft 2011-02-16 14:16

Original 0.48:
3*2^382449+1 is prime! Time : 189.837 sec.
3*2^414840-1 is prime! Time : 197.764 sec.
9999*2^458051+1 is prime! Time : 419.483 sec.
1000065*2^390927-1 is prime! Time : 330.561 sec.

Modify:
[CODE]
set_fftlen:
...
FFTLEN/=2;
return (zpad);
}
[/CODE]
Modify 0.48:
3*2^382449+1 is prime! Time : 141.611 sec.
3*2^414840-1 is prime! Time : 138.820 sec.
9999*2^458051+1 is prime! Time : 264.953 sec.
1000065*2^390927-1 is prime! Time : 201.132 sec.

Interesting ?

mdettweiler 2011-02-16 18:40

Hmm, this turned out really well. When I tried testing one of ltd's large PSP numbers with 0.48, I got a segmentation fault:
[code]
gary@herford:~/Desktop/gpu-stuff/llrcuda$ ./llrCUDA -d -q237019*2^6100018+1
Starting Proth prime test of 237019*2^6100018+1
Using complex irrational base DWT, FFT length = 2097152, a = 3
Segmentation fault
[/code]
The segfault occurred while I was asleep, so I'm not sure exactly where in the test it happened. I suspect it was at the very end, since the test was due to finish sometime during the night.

If anyone else would like to try to confirm this, that would be great. Meanwhile, I'll take a whack at the second number ltd gave to see if it also segfaults.

msft 2011-02-17 00:39

[CODE]

llrcuda.0.48$ gdb ./llrCUDA
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu"...
(gdb) r -d -q237019*2^6100018+1
Starting program: /home/msft/llr.cuda/049/llrcuda.0.48/llrCUDA -d -q237019*2^6100018+1
[Thread debugging using libthread_db enabled]
[New Thread 0x7f5c4cce7700 (LWP 14376)]
Starting Proth prime test of 237019*2^6100018+1
^C7019*2^6100018+1, bit: 230000 / 6100035 [3.77%]. Time per bit: 5.254 ms.
[/CODE]
I guess giants.cu,this giants.cu not support large number.(you can compare gwpnum/giants.c)


All times are UTC. The time now is 13:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.