![]() |
More tests:
[CODE]llrcuda -q"3*2^164987-1" -d too small Exponent... llrcuda -q"39*2^113549-1" -d too small Exponent...[/CODE] GPU usage close to 100%, CPU core at 100% all the time (that's for latest Sandy Bridge i5-2500@3.3Ghz) [CODE]Starting Lucas Lehmer Riesel prime test of 3*2^414840-1 Using real irrational base DWT, FFT length = 65536 V1 = 3 ; Computing U0... V1 = 3 ; Computing U0...done. Starting Lucas-Lehmer loop... 3*2^414840-1 is prime! Time : 112.834 sec.98.83%]. Time per iteration : 0.273 ms.[/CODE] |
[QUOTE=Honza;252570]GPU usage close to 100%, CPU core at 100% all the time (that's for latest Sandy Bridge i5-2500@3.3Ghz)[/QUOTE]
That's interesting...in my tests (GTX 460, stock Q6600@2.4Ghz) I only ever get a small amount of CPU usage. This may be a difference between the Windows and Linux versions. BTW, how do you check the GPU usage %? I looked in the "nvidia-settings" GUI program (this is on Linux, Ubuntu 10.04.1 to be exact) but didn't see anything for that, just a temperature meter. |
[QUOTE=mdettweiler;252504]Could you possibly post a few PSP first-pass residues and the amount of time it took to run the tests (and on what hardware)? Then I could run them on Gary's GPU and compare speeds (since I already have a working llrCUDA setup).[/QUOTE]
Here some results from the DC effort. So the residues are already verified. Machine is a i7 920 2.67GHz with seven threads running. [2010-12-24 19:44:47 WEST] Candidate: 237019*2^6100018+1 Program: llr.exe Residue: 34687837ED148D74 Time: 95109 seconds [2010-12-25 21:38:49 WEST] Candidate: 237019*2^6100630+1 Program: llr.exe Residue: 381782D8C112D665 Time: 93141 seconds [2010-12-27 00:15:56 WEST] Candidate: 237019*2^6101242+1 Program: llr.exe Residue: 1A0C5A3E8372AFC2 Time: 95784 seconds [2010-12-28 08:59:07 WEST] Candidate: 237019*2^6102286+1 Program: llr.exe Residue: 4E2920067DCBCF57 Time: 91167 seconds |
[QUOTE=Honza;252569]I dared to try, Win 7 x64/GTX 580.
Starting Lucas Lehmer Riesel prime test of 1000065*2^390927-1 Using rational base DWT and generic reduction, FFT length = 65536 V1 = 5 ; Computing U0... V1 = 5 ; Computing U0...done. Starting Lucas-Lehmer loop... 1000065*2^390927-1 is prime! Time : 296.914 sec.99.76%]. Time per iteration : 0.758 ms.[/QUOTE] Which driver version do you use? At least we know now that the software can run on windows in principle. Next step will be to find out why it uses that much CPU. When msft publishes a new version I will make a new version also. |
[QUOTE=mdettweiler;252580]BTW, how do you check the GPU usage %? I looked in the "nvidia-settings" GUI program (this is on Linux, Ubuntu 10.04.1 to be exact) but didn't see anything for that, just a temperature meter.[/QUOTE]
Latest GPU-Z. |
[QUOTE=ltd;252586]Which driver version do you use?
At least we know now that the software can run on windows in principle. Next step will be to find out why it uses that much CPU. When msft publishes a new version I will make a new version also.[/QUOTE] Driver version 263.06. I *think* pre-260 doesn't support GTX 5-series. Yeah, CPU usage. For example tpsieve uses almost no CPU. |
[QUOTE=Ken_g6;251846]I also looked, and CUDA flags are never set, so blocking sync is never enabled.[/QUOTE]High CPU usage might be fixed by changing this. However, GeneferCUDA does have this enabled (except in the most recent version) and has high CPU usage on my Linux machine with CUDA 3.1 anyway. :unsure:
Edit: Oh, and there's a known bug in the 260.* Linux drivers with blocking sync, but that code snippet I posted earlier might get around it. |
[CODE]
llr: loop: mul_two_to_phi fft mul fft mul_two_to_minusphi normalize MacLucasFFTW: loop: fft mul fft mul_two_to_phi mul_two_to_minusphi normalize [/CODE] Place of mul_two_to_phi was cause of poor performance. |
Original 0.48:
3*2^382449+1 is prime! Time : 189.837 sec. 3*2^414840-1 is prime! Time : 197.764 sec. 9999*2^458051+1 is prime! Time : 419.483 sec. 1000065*2^390927-1 is prime! Time : 330.561 sec. Modify: [CODE] set_fftlen: ... FFTLEN/=2; return (zpad); } [/CODE] Modify 0.48: 3*2^382449+1 is prime! Time : 141.611 sec. 3*2^414840-1 is prime! Time : 138.820 sec. 9999*2^458051+1 is prime! Time : 264.953 sec. 1000065*2^390927-1 is prime! Time : 201.132 sec. Interesting ? |
Hmm, this turned out really well. When I tried testing one of ltd's large PSP numbers with 0.48, I got a segmentation fault:
[code] gary@herford:~/Desktop/gpu-stuff/llrcuda$ ./llrCUDA -d -q237019*2^6100018+1 Starting Proth prime test of 237019*2^6100018+1 Using complex irrational base DWT, FFT length = 2097152, a = 3 Segmentation fault [/code] The segfault occurred while I was asleep, so I'm not sure exactly where in the test it happened. I suspect it was at the very end, since the test was due to finish sometime during the night. If anyone else would like to try to confirm this, that would be great. Meanwhile, I'll take a whack at the second number ltd gave to see if it also segfaults. |
[CODE]
llrcuda.0.48$ gdb ./llrCUDA GNU gdb 6.8-debian Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu"... (gdb) r -d -q237019*2^6100018+1 Starting program: /home/msft/llr.cuda/049/llrcuda.0.48/llrCUDA -d -q237019*2^6100018+1 [Thread debugging using libthread_db enabled] [New Thread 0x7f5c4cce7700 (LWP 14376)] Starting Proth prime test of 237019*2^6100018+1 ^C7019*2^6100018+1, bit: 230000 / 6100035 [3.77%]. Time per bit: 5.254 ms. [/CODE] I guess giants.cu,this giants.cu not support large number.(you can compare gwpnum/giants.c) |
| All times are UTC. The time now is 13:00. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.