![]() |
Another try on one of my latest SGS
[CODE]>llrcuda.exe -d -q"727682426205*2^666670-1" Starting Lucas Lehmer Riesel prime test of 727682426205*2^666670-1 Using rational base DWT and generic reduction, FFT length = 65536 V1 = 21 ; Computing U0... Iter: 19/40, ERROR: ROUND OFF (0.5) > 0.4 Continuing from last save file. Resuming test of 727682426205*2^666670-1 (computing U0) at iteration 3 [7.50%] Disregard last error. Result is reproducible and thus not a hardware problem. For added safety, redoing iteration using a slower, more reliable method.[/CODE] I wonder what are the limits in term of n, k (and their combination)... |
[QUOTE=ltd;253141]Can somebody try if this version runs better on Windows.
I try to set some cuda flags. [URL="http://www.psp-project.de/test/llrcuda_flag.rar"]www.psp-project.de/test/llrcuda_flag.rar[/URL][/QUOTE] Even worse while CPU under full load. It needs almost exactly the same CPU time and average GPU load is lower (and so the total runtime is longer). :sad: I can also confirm the crash with 8759318*2^8759318-1. One positive note: Checkpointing is working. |
Thanks for the test.
I expected a longer GPU time but hoped for significant better CPU performance. In the end it was a nice try but a bad idea. |
It worked! :grin: With 0.52, I was able to successfully complete one of the PSP n=~6M tests that failed earlier under 0.48:
[code] gary@herford:~/Desktop/gpu-stuff/llrcuda$ time ./llrCUDA -d -q237019*2^6100018+1 Starting Proth prime test of 237019*2^6100018+1 Using complex irrational base DWT, FFT length = 1048576, a = 3 237019*2^6100018+1 is not prime. Proth RES64: 803F9AB12897EAA3 Time : 30976.120 sec. real 516m33.249s user 197m59.120s sys 203m48.490s [/code] Only problem is, it got the residue wrong. Here's ltd's original result: [code] [2010-12-24 19:44:47 WEST] Candidate: 237019*2^6100018+1 Program: llr.exe Residue: 34687837ED148D74 Time: 95109 seconds [/code] Since this result has previously been doublechecked on a CPU, it seems the GPU was the one that got it wrong. msft, any idea what might have caused this? Might it be an unstable GPU, or is it more likely something in the program? I'm going to try another of these PSP tests with the latest llrCUDA 0.55. That way we can see if these incorrect residues happen all the time, or just sometimes. |
1 Attachment(s)
[CODE]
$ tar -xvf llrcuda.0.55.tar.bz2 $ cd llrcuda.0.55 $ mv Llr.c.10 Llr.c $ make $ ./llrCUDA -d -q237019*2^6100018+1 Starting Proth prime test of 237019*2^6100018+1 237019*2^6100018+1 is not prime. Proth RES64: 1D660D1276F1E802 Time : 120.981 sec. [/CODE] You can get 10 Iter result. Attach file for llr384src.zip,you can compare. |
Another one bites the dust (this time with 0.55):
[code] gary@herford:~/Desktop/gpu-stuff/llrcuda$ time ./llrCUDA -d -q237019*2^6100630+1 Starting Proth prime test of 237019*2^6100630+1 Using complex irrational base DWT, FFT length = 1048576, a = 3 237019*2^6100630+1 is not prime. Proth RES64: 9332D0532C5E9A9D Time : 30984.745 sec. real 516m40.783s user 195m53.130s sys 205m45.830s [/code] versus: [code] [2010-12-25 21:38:49 WEST] Candidate: 237019*2^6100630+1 Program: llr.exe Residue: 381782D8C112D665 Time: 93141 seconds [/code] I'll try the new llr.c and see how it works with that. |
Here's what I get with Llr.c.10:
[code] gary@herford:~/Desktop/gpu-stuff/llrcuda$ ./llrCUDA -d -q237019*2^6100018+1 Starting Proth prime test of 237019*2^6100018+1 237019*2^6100018+1 is not prime. Proth RES64: 1D660D1276F1E802 Time : 139.622 sec. [/code] So it seems that I do get the same result as you from this. (I'm assuming this is supposed to produce an interim residue after 10 iterations?) There seems, though, to be an easier way to get interim residues that doesn't require recompiling the program: [code] gary@herford:~/Desktop/gpu-stuff/llrcuda$ ./llrCUDA -d -q237019*2^6100018+1 -oInterimResidues=10000 Starting Proth prime test of 237019*2^6100018+1 237019*2^6100018+1 interim residue 7FCCC09ECD1E2670 at bit 10000 19.574 ms. 237019*2^6100018+1 interim residue 7FCCC09ECD1E2670 at bit 10001 237019*2^6100018+1 interim residue 49DF7F430637E4C4 at bit 20000 5.061 ms. 237019*2^6100018+1 interim residue 49DF7F430637E4C4 at bit 20001 237019*2^6100018+1 interim residue 44485CE353854AA4 at bit 30000 5.072 ms. 237019*2^6100018+1 interim residue 44485CE353854AA4 at bit 30001 237019*2^6100018+1 interim residue 088565E9F724CB11 at bit 40000 5.069 ms. 237019*2^6100018+1 interim residue 088565E9F724CB11 at bit 40001 237019*2^6100018+1 interim residue B9517CED877AB1E2 at bit 50000 5.072 ms. 237019*2^6100018+1 interim residue B9517CED877AB1E2 at bit 50001 237019*2^6100018+1 interim residue FC08DFA58F1A8CFB at bit 60000 5.080 ms. 237019*2^6100018+1 interim residue FC08DFA58F1A8CFB at bit 60001 237019*2^6100018+1 interim residue C259A339BF77A90B at bit 70000 5.055 ms. 237019*2^6100018+1 interim residue C259A339BF77A90B at bit 70001 237019*2^6100018+1 interim residue 8B0B3739CE360F0C at bit 80000 5.065 ms. (etc.) [/code] It's not documented in readme.txt, but it seems to work the same way as in Prime95/mprime. When I try it with -oInterimResidues=10, I get: [code] gary@herford:~/Desktop/gpu-stuff/llrcuda$ ./llrCUDA -d -q237019*2^6100018+1 -oInterimResidues=10 Starting Proth prime test of 237019*2^6100018+1 237019*2^6100018+1 interim residue E63AA0B720E5DF93 at bit 103 237019*2^6100018+1 interim residue 84DAFF6980BE4B3B at bit 11 237019*2^6100018+1 interim residue 80AB73E05DD08B71 at bit 20 237019*2^6100018+1 interim residue F592BAF2FD93E7E1 at bit 21 237019*2^6100018+1 interim residue B6F593AF43A78AA1 at bit 30 237019*2^6100018+1 interim residue 2498B0065EE50090 at bit 31 237019*2^6100018+1 interim residue 42E48BE7FE1CF245 at bit 40 237019*2^6100018+1 interim residue 9EE7200EC8058A4C at bit 41 237019*2^6100018+1 interim residue 2D9094B4BA6A7FC1 at bit 50 237019*2^6100018+1 interim residue FB7D1F91D3EF9546 at bit 51 237019*2^6100018+1 interim residue FB7D1F91D3EF9546 at bit 60 237019*2^6100018+1 interim residue FB7D1F91D3EF9546 at bit 61 237019*2^6100018+1 interim residue FB7D1F91D3EF9546 at bit 70 237019*2^6100018+1 interim residue FB7D1F91D3EF9546 at bit 71 237019*2^6100018+1 interim residue FB7D1F91D3EF9546 at bit 80 237019*2^6100018+1 interim residue FB7D1F91D3EF9546 at bit 81 237019*2^6100018+1 interim residue FB7D1F91D3EF9546 at bit 90 237019*2^6100018+1 interim residue FB7D1F91D3EF9546 at bit 91 237019*2^6100018+1 interim residue FB7D1F91D3EF9546 at bit 100 237019*2^6100018+1 interim residue FB7D1F91D3EF9546 at bit 101 237019*2^6100018+1 interim residue FB7D1F91D3EF9546 at bit 110 237019*2^6100018+1 interim residue FB7D1F91D3EF9546 at bit 111 237019*2^6100018+1 interim residue FB7D1F91D3EF9546 at bit 120 237019*2^6100018+1 interim residue FB7D1F91D3EF9546 at bit 121 ♥ Caught signal. Terminating. 237019*2^6100018+1 interim residue 30622E65F77D08E9 at bit 130 [/code] So I'm not sure, but there may be some kind of bug with the InterimResidues option that causes goofy things to happen when you set it to display interim residues at a high frequency (for instance, every 10 iterations, rather than something much larger like 10000 iterations). For one, it referred to the first residue as "bit 103" even though it was clearly supposed to be bit 10; this looks like a cosmetic bug. And from bit 50-121, it constantly gave the same residue even over multiple iterations--which doesn't make sense at all. :huh: Note that I'm guessing this apparent issue would probably be in the original LLR, and not just in llrCUDA (though I admittedly haven't tried it with CPU LLR). |
1 Attachment(s)
Fix oInterimResidues Bug.
237019*2^6100018+1 Bug reason is fft length. if Delete "if(b==2) FFTLEN/=2;", fix Bug. [CODE] llrcuda.0.57$ ./llrCUDA -d -q237019*2^6100018+1 -oInterimResidues=1000 Starting Proth prime test of 237019*2^6100018+1 237019*2^6100018+1 interim residue C62C21499E54D684 at bit 1000 237019*2^6100018+1 interim residue 625A6248336F4C8B at bit 1001 237019*2^6100018+1 interim residue FEDB3F798612F804 at bit 2000 237019*2^6100018+1 interim residue 604B813A71B7C223 at bit 2001 237019*2^6100018+1 interim residue 9CE5D33781C59C5B at bit 3000 237019*2^6100018+1 interim residue 494344DB7C530074 at bit 3001 237019*2^6100018+1 interim residue C94D8FF4BC33622B at bit 4000 237019*2^6100018+1 interim residue 959388D7E56FC990 at bit 4001 237019*2^6100018+1 interim residue 6EDA0D50AA8D3F2F at bit 5000 237019*2^6100018+1 interim residue 9F3692C36C2DE763 at bit 5001 237019*2^6100018+1 interim residue 956FBF35B0790F4D at bit 6000 237019*2^6100018+1 interim residue 5B6369DF8B6A413F at bit 6001 237019*2^6100018+1 interim residue B822AF2F37463961 at bit 7000 237019*2^6100018+1 interim residue 981A48E058A0F469 at bit 7001 237019*2^6100018+1 interim residue FAEC34D93CEF7AE8 at bit 8000 237019*2^6100018+1 interim residue 6343B969AC8EFCCE at bit 8001 237019*2^6100018+1 interim residue C5639EF45DAC3528 at bit 9000 237019*2^6100018+1 interim residue 5EA8B6354926B9FE at bit 9001 237019*2^6100018+1 interim residue 94F29CC992FFFF09 at bit 10000 23.907 ms. [/CODE] |
1 Attachment(s)
May be Fix Bug.
|
With 0.59:
[code] gary@herford:~/Desktop/gpu-stuff/llrcuda$ ./llrCUDA -d -q237019*2^6100018+1 -oInterimResidues=1000 Resuming Proth prime test of 237019*2^6100018+1 at bit 581 [0.00%] 237019*2^6100018+1 interim residue C62C21499E54D684 at bit 1000 237019*2^6100018+1 interim residue 625A6248336F4C8B at bit 1001 237019*2^6100018+1 interim residue FEDB3F798612F804 at bit 2000 237019*2^6100018+1 interim residue 604B813A71B7C223 at bit 2001 237019*2^6100018+1 interim residue 9CE5D33781C59C5B at bit 3000 237019*2^6100018+1 interim residue 494344DB7C530074 at bit 3001 237019*2^6100018+1 interim residue C94D8FF4BC33622B at bit 4000 237019*2^6100018+1 interim residue 959388D7E56FC990 at bit 4001 [B]237019*2^6100018+1 interim residue 7EF55A8F1C9B7D42 at bit 5000[/B] 237019*2^6100018+1 interim residue 01002FB5DED459DC at bit 5001 237019*2^6100018+1 interim residue 12674737722E5123 at bit 6000 237019*2^6100018+1 interim residue 7541808760605DC7 at bit 6001 237019*2^6100018+1 interim residue C0FC09DFE7BDA651 at bit 7000 237019*2^6100018+1 interim residue 12FAF6191C09A16F at bit 7001 237019*2^6100018+1 interim residue 9E56883D47A937CB at bit 8000 237019*2^6100018+1 interim residue DD45B6D93EDF3125 at bit 8001 237019*2^6100018+1 interim residue 1EE03FF8859BE7E7 at bit 9000 237019*2^6100018+1 interim residue BD46F256E47E476A at bit 9001 237019*2^6100018+1 interim residue A3BFDF08B2813649 at bit 10000 28.845 ms. [/code] It seems that a discrepancy creeps in somewhere between iteration 4001 and 5000. When I re-run it with -oErrorCheck=1 to force error checking on every iteration: [code] gary@herford:~/Desktop/gpu-stuff/llrcuda$ ./llrCUDA -d -q237019*2^6100018+1 -oInterimResidues=1000 -oErrorCheck=1 Starting Proth prime test of 237019*2^6100018+1 237019*2^6100018+1 interim residue C62C21499E54D684 at bit 1000 237019*2^6100018+1 interim residue 625A6248336F4C8B at bit 1001 237019*2^6100018+1 interim residue FEDB3F798612F804 at bit 2000 237019*2^6100018+1 interim residue 604B813A71B7C223 at bit 2001 237019*2^6100018+1 interim residue 9CE5D33781C59C5B at bit 3000 237019*2^6100018+1 interim residue 494344DB7C530074 at bit 3001 237019*2^6100018+1 interim residue C94D8FF4BC33622B at bit 4000 237019*2^6100018+1 interim residue 959388D7E56FC990 at bit 4001 237019*2^6100018+1 interim residue 7EF55A8F1C9B7D42 at bit 5000 237019*2^6100018+1 interim residue 01002FB5DED459DC at bit 5001 237019*2^6100018+1 interim residue 12674737722E5123 at bit 6000 237019*2^6100018+1 interim residue 7541808760605DC7 at bit 6001 [/code] Interestingly enough, this run would seem to agree with my earlier one without -oErrorCheck. The only one that [I]doesn't[/I] agree is your run with 0.57. To verify whether my run or yours (or neither!) was the correct one, I also ran a similar test using LLR 3.8.5 on a CPU: [code] $ ./cllr.exe -d -q237019*2^6100018+1 -oInterimResidues=1000 Starting Proth prime test of 237019*2^6100018+1 Using all-complex Core2 type-3 FFT length 576K, Pass1=768, Pass2=768, a = 3 237019*2^6100018+1 interim residue C62C21499E54D684 at bit 1000 237019*2^6100018+1 interim residue 625A6248336F4C8B at bit 1001 237019*2^6100018+1 interim residue FEDB3F798612F804 at bit 2000 237019*2^6100018+1 interim residue 604B813A71B7C223 at bit 2001 237019*2^6100018+1 interim residue 9CE5D33781C59C5B at bit 3000 237019*2^6100018+1 interim residue 494344DB7C530074 at bit 3001 237019*2^6100018+1 interim residue C94D8FF4BC33622B at bit 4000 237019*2^6100018+1 interim residue 959388D7E56FC990 at bit 4001 237019*2^6100018+1 interim residue 7EF55A8F1C9B7D42 at bit 5000 237019*2^6100018+1 interim residue 01002FB5DED459DC at bit 5001 237019*2^6100018+1 interim residue 12674737722E5123 at bit 6000 237019*2^6100018+1 interim residue 7541808760605DC7 at bit 6001 237019*2^6100018+1 interim residue C0FC09DFE7BDA651 at bit 7000 237019*2^6100018+1 interim residue 12FAF6191C09A16F at bit 7001 237019*2^6100018+1 interim residue 9E56883D47A937CB at bit 8000 237019*2^6100018+1 interim residue DD45B6D93EDF3125 at bit 8001 237019*2^6100018+1 interim residue 1EE03FF8859BE7E7 at bit 9000 237019*2^6100018+1 interim residue BD46F256E47E476A at bit 9001 237019*2^6100018+1 interim residue A3BFDF08B2813649 at bit 10000 23.748 ms. [/code] So it looks like my run was the correct one. (Might the error in your test be due to the bug you fixed in 0.59?) I'm now going to try running an entire test for 237019*2^6100018+1 using 0.59. We shall see if the final result matches ltd's. :smile: BTW, I see that this new version chooses the 2097152 FFT, versus 1048576 as before. Is this just to fix the InterimResidues bug, or is this actually needed to ensure integrity of a test this size? |
[QUOTE=mdettweiler;253295]
BTW, I see that this new version chooses the 2097152 FFT, versus 1048576 as before. Is this just to fix the InterimResidues bug, or is this actually needed to ensure integrity of a test this size?[/QUOTE] We can get information from llr. [CODE] $ ./llr -d -q237019*2^6100018+1 Resuming Proth prime test of 237019*2^6100018+1 at bit 43 [0.00%] Using all-complex Core2 type-3 FFT length 576K, Pass1=768, Pass2=768, a = 3 [/CODE] FFT length over 512K,We need 2097152 FFT. |
| All times are UTC. The time now is 13:00. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.