mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   llrCUDA (https://www.mersenneforum.org/showthread.php?t=14608)

Honza 2011-02-20 15:28

Another try on one of my latest SGS

[CODE]>llrcuda.exe -d -q"727682426205*2^666670-1"
Starting Lucas Lehmer Riesel prime test of 727682426205*2^666670-1
Using rational base DWT and generic reduction, FFT length = 65536
V1 = 21 ; Computing U0...
Iter: 19/40, ERROR: ROUND OFF (0.5) > 0.4
Continuing from last save file.
Resuming test of 727682426205*2^666670-1 (computing U0) at iteration 3 [7.50%]
Disregard last error. Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.[/CODE]

I wonder what are the limits in term of n, k (and their combination)...

pschoefer 2011-02-20 15:30

[QUOTE=ltd;253141]Can somebody try if this version runs better on Windows.
I try to set some cuda flags.

[URL="http://www.psp-project.de/test/llrcuda_flag.rar"]www.psp-project.de/test/llrcuda_flag.rar[/URL][/QUOTE]
Even worse while CPU under full load. It needs almost exactly the same CPU time and average GPU load is lower (and so the total runtime is longer). :sad:

I can also confirm the crash with 8759318*2^8759318-1.

One positive note: Checkpointing is working.

ltd 2011-02-20 15:35

Thanks for the test.
I expected a longer GPU time but hoped for significant better CPU performance.
In the end it was a nice try but a bad idea.

mdettweiler 2011-02-20 20:11

It worked! :grin: With 0.52, I was able to successfully complete one of the PSP n=~6M tests that failed earlier under 0.48:
[code]
gary@herford:~/Desktop/gpu-stuff/llrcuda$ time ./llrCUDA -d -q237019*2^6100018+1
Starting Proth prime test of 237019*2^6100018+1
Using complex irrational base DWT, FFT length = 1048576, a = 3
237019*2^6100018+1 is not prime. Proth RES64: 803F9AB12897EAA3 Time : 30976.120 sec.

real 516m33.249s
user 197m59.120s
sys 203m48.490s
[/code]
Only problem is, it got the residue wrong. Here's ltd's original result:
[code]
[2010-12-24 19:44:47 WEST] Candidate: 237019*2^6100018+1 Program: llr.exe Residue: 34687837ED148D74 Time: 95109 seconds
[/code]
Since this result has previously been doublechecked on a CPU, it seems the GPU was the one that got it wrong. msft, any idea what might have caused this? Might it be an unstable GPU, or is it more likely something in the program?

I'm going to try another of these PSP tests with the latest llrCUDA 0.55. That way we can see if these incorrect residues happen all the time, or just sometimes.

msft 2011-02-21 05:08

1 Attachment(s)
[CODE]
$ tar -xvf llrcuda.0.55.tar.bz2
$ cd llrcuda.0.55
$ mv Llr.c.10 Llr.c
$ make
$ ./llrCUDA -d -q237019*2^6100018+1
Starting Proth prime test of 237019*2^6100018+1

237019*2^6100018+1 is not prime. Proth RES64: 1D660D1276F1E802 Time : 120.981 sec.
[/CODE]
You can get 10 Iter result.
Attach file for llr384src.zip,you can compare.

mdettweiler 2011-02-21 05:31

Another one bites the dust (this time with 0.55):
[code]
gary@herford:~/Desktop/gpu-stuff/llrcuda$ time ./llrCUDA -d -q237019*2^6100630+1
Starting Proth prime test of 237019*2^6100630+1
Using complex irrational base DWT, FFT length = 1048576, a = 3
237019*2^6100630+1 is not prime. Proth RES64: 9332D0532C5E9A9D Time : 30984.745 sec.

real 516m40.783s
user 195m53.130s
sys 205m45.830s
[/code]
versus:
[code]
[2010-12-25 21:38:49 WEST] Candidate: 237019*2^6100630+1 Program: llr.exe Residue: 381782D8C112D665 Time: 93141 seconds
[/code]
I'll try the new llr.c and see how it works with that.

mdettweiler 2011-02-21 09:07

Here's what I get with Llr.c.10:
[code]
gary@herford:~/Desktop/gpu-stuff/llrcuda$ ./llrCUDA -d -q237019*2^6100018+1
Starting Proth prime test of 237019*2^6100018+1
237019*2^6100018+1 is not prime. Proth RES64: 1D660D1276F1E802 Time : 139.622 sec.
[/code]
So it seems that I do get the same result as you from this. (I'm assuming this is supposed to produce an interim residue after 10 iterations?)

There seems, though, to be an easier way to get interim residues that doesn't require recompiling the program:
[code]
gary@herford:~/Desktop/gpu-stuff/llrcuda$ ./llrCUDA -d -q237019*2^6100018+1 -oInterimResidues=10000
Starting Proth prime test of 237019*2^6100018+1
237019*2^6100018+1 interim residue 7FCCC09ECD1E2670 at bit 10000 19.574 ms.
237019*2^6100018+1 interim residue 7FCCC09ECD1E2670 at bit 10001
237019*2^6100018+1 interim residue 49DF7F430637E4C4 at bit 20000 5.061 ms.
237019*2^6100018+1 interim residue 49DF7F430637E4C4 at bit 20001
237019*2^6100018+1 interim residue 44485CE353854AA4 at bit 30000 5.072 ms.
237019*2^6100018+1 interim residue 44485CE353854AA4 at bit 30001
237019*2^6100018+1 interim residue 088565E9F724CB11 at bit 40000 5.069 ms.
237019*2^6100018+1 interim residue 088565E9F724CB11 at bit 40001
237019*2^6100018+1 interim residue B9517CED877AB1E2 at bit 50000 5.072 ms.
237019*2^6100018+1 interim residue B9517CED877AB1E2 at bit 50001
237019*2^6100018+1 interim residue FC08DFA58F1A8CFB at bit 60000 5.080 ms.
237019*2^6100018+1 interim residue FC08DFA58F1A8CFB at bit 60001
237019*2^6100018+1 interim residue C259A339BF77A90B at bit 70000 5.055 ms.
237019*2^6100018+1 interim residue C259A339BF77A90B at bit 70001
237019*2^6100018+1 interim residue 8B0B3739CE360F0C at bit 80000 5.065 ms.
(etc.)
[/code]
It's not documented in readme.txt, but it seems to work the same way as in Prime95/mprime.

When I try it with -oInterimResidues=10, I get:
[code]
gary@herford:~/Desktop/gpu-stuff/llrcuda$ ./llrCUDA -d -q237019*2^6100018+1 -oInterimResidues=10
Starting Proth prime test of 237019*2^6100018+1
237019*2^6100018+1 interim residue E63AA0B720E5DF93 at bit 103
237019*2^6100018+1 interim residue 84DAFF6980BE4B3B at bit 11
237019*2^6100018+1 interim residue 80AB73E05DD08B71 at bit 20
237019*2^6100018+1 interim residue F592BAF2FD93E7E1 at bit 21
237019*2^6100018+1 interim residue B6F593AF43A78AA1 at bit 30
237019*2^6100018+1 interim residue 2498B0065EE50090 at bit 31
237019*2^6100018+1 interim residue 42E48BE7FE1CF245 at bit 40
237019*2^6100018+1 interim residue 9EE7200EC8058A4C at bit 41
237019*2^6100018+1 interim residue 2D9094B4BA6A7FC1 at bit 50
237019*2^6100018+1 interim residue FB7D1F91D3EF9546 at bit 51
237019*2^6100018+1 interim residue FB7D1F91D3EF9546 at bit 60
237019*2^6100018+1 interim residue FB7D1F91D3EF9546 at bit 61
237019*2^6100018+1 interim residue FB7D1F91D3EF9546 at bit 70
237019*2^6100018+1 interim residue FB7D1F91D3EF9546 at bit 71
237019*2^6100018+1 interim residue FB7D1F91D3EF9546 at bit 80
237019*2^6100018+1 interim residue FB7D1F91D3EF9546 at bit 81
237019*2^6100018+1 interim residue FB7D1F91D3EF9546 at bit 90
237019*2^6100018+1 interim residue FB7D1F91D3EF9546 at bit 91
237019*2^6100018+1 interim residue FB7D1F91D3EF9546 at bit 100
237019*2^6100018+1 interim residue FB7D1F91D3EF9546 at bit 101
237019*2^6100018+1 interim residue FB7D1F91D3EF9546 at bit 110
237019*2^6100018+1 interim residue FB7D1F91D3EF9546 at bit 111
237019*2^6100018+1 interim residue FB7D1F91D3EF9546 at bit 120
237019*2^6100018+1 interim residue FB7D1F91D3EF9546 at bit 121

Caught signal. Terminating.
237019*2^6100018+1 interim residue 30622E65F77D08E9 at bit 130
[/code]
So I'm not sure, but there may be some kind of bug with the InterimResidues option that causes goofy things to happen when you set it to display interim residues at a high frequency (for instance, every 10 iterations, rather than something much larger like 10000 iterations). For one, it referred to the first residue as "bit 103" even though it was clearly supposed to be bit 10; this looks like a cosmetic bug. And from bit 50-121, it constantly gave the same residue even over multiple iterations--which doesn't make sense at all. :huh: Note that I'm guessing this apparent issue would probably be in the original LLR, and not just in llrCUDA (though I admittedly haven't tried it with CPU LLR).

msft 2011-02-21 13:33

1 Attachment(s)
Fix oInterimResidues Bug.

237019*2^6100018+1 Bug reason is fft length.

if Delete "if(b==2) FFTLEN/=2;", fix Bug.

[CODE]
llrcuda.0.57$ ./llrCUDA -d -q237019*2^6100018+1 -oInterimResidues=1000
Starting Proth prime test of 237019*2^6100018+1
237019*2^6100018+1 interim residue C62C21499E54D684 at bit 1000
237019*2^6100018+1 interim residue 625A6248336F4C8B at bit 1001
237019*2^6100018+1 interim residue FEDB3F798612F804 at bit 2000
237019*2^6100018+1 interim residue 604B813A71B7C223 at bit 2001
237019*2^6100018+1 interim residue 9CE5D33781C59C5B at bit 3000
237019*2^6100018+1 interim residue 494344DB7C530074 at bit 3001
237019*2^6100018+1 interim residue C94D8FF4BC33622B at bit 4000
237019*2^6100018+1 interim residue 959388D7E56FC990 at bit 4001
237019*2^6100018+1 interim residue 6EDA0D50AA8D3F2F at bit 5000
237019*2^6100018+1 interim residue 9F3692C36C2DE763 at bit 5001
237019*2^6100018+1 interim residue 956FBF35B0790F4D at bit 6000
237019*2^6100018+1 interim residue 5B6369DF8B6A413F at bit 6001
237019*2^6100018+1 interim residue B822AF2F37463961 at bit 7000
237019*2^6100018+1 interim residue 981A48E058A0F469 at bit 7001
237019*2^6100018+1 interim residue FAEC34D93CEF7AE8 at bit 8000
237019*2^6100018+1 interim residue 6343B969AC8EFCCE at bit 8001
237019*2^6100018+1 interim residue C5639EF45DAC3528 at bit 9000
237019*2^6100018+1 interim residue 5EA8B6354926B9FE at bit 9001
237019*2^6100018+1 interim residue 94F29CC992FFFF09 at bit 10000 23.907 ms.
[/CODE]

msft 2011-02-21 17:21

1 Attachment(s)
May be Fix Bug.

mdettweiler 2011-02-21 19:57

With 0.59:
[code]
gary@herford:~/Desktop/gpu-stuff/llrcuda$ ./llrCUDA -d -q237019*2^6100018+1 -oInterimResidues=1000
Resuming Proth prime test of 237019*2^6100018+1 at bit 581 [0.00%]
237019*2^6100018+1 interim residue C62C21499E54D684 at bit 1000
237019*2^6100018+1 interim residue 625A6248336F4C8B at bit 1001
237019*2^6100018+1 interim residue FEDB3F798612F804 at bit 2000
237019*2^6100018+1 interim residue 604B813A71B7C223 at bit 2001
237019*2^6100018+1 interim residue 9CE5D33781C59C5B at bit 3000
237019*2^6100018+1 interim residue 494344DB7C530074 at bit 3001
237019*2^6100018+1 interim residue C94D8FF4BC33622B at bit 4000
237019*2^6100018+1 interim residue 959388D7E56FC990 at bit 4001
[B]237019*2^6100018+1 interim residue 7EF55A8F1C9B7D42 at bit 5000[/B]
237019*2^6100018+1 interim residue 01002FB5DED459DC at bit 5001
237019*2^6100018+1 interim residue 12674737722E5123 at bit 6000
237019*2^6100018+1 interim residue 7541808760605DC7 at bit 6001
237019*2^6100018+1 interim residue C0FC09DFE7BDA651 at bit 7000
237019*2^6100018+1 interim residue 12FAF6191C09A16F at bit 7001
237019*2^6100018+1 interim residue 9E56883D47A937CB at bit 8000
237019*2^6100018+1 interim residue DD45B6D93EDF3125 at bit 8001
237019*2^6100018+1 interim residue 1EE03FF8859BE7E7 at bit 9000
237019*2^6100018+1 interim residue BD46F256E47E476A at bit 9001
237019*2^6100018+1 interim residue A3BFDF08B2813649 at bit 10000 28.845 ms.
[/code]
It seems that a discrepancy creeps in somewhere between iteration 4001 and 5000.

When I re-run it with -oErrorCheck=1 to force error checking on every iteration:
[code]
gary@herford:~/Desktop/gpu-stuff/llrcuda$ ./llrCUDA -d -q237019*2^6100018+1 -oInterimResidues=1000 -oErrorCheck=1
Starting Proth prime test of 237019*2^6100018+1
237019*2^6100018+1 interim residue C62C21499E54D684 at bit 1000
237019*2^6100018+1 interim residue 625A6248336F4C8B at bit 1001
237019*2^6100018+1 interim residue FEDB3F798612F804 at bit 2000
237019*2^6100018+1 interim residue 604B813A71B7C223 at bit 2001
237019*2^6100018+1 interim residue 9CE5D33781C59C5B at bit 3000
237019*2^6100018+1 interim residue 494344DB7C530074 at bit 3001
237019*2^6100018+1 interim residue C94D8FF4BC33622B at bit 4000
237019*2^6100018+1 interim residue 959388D7E56FC990 at bit 4001
237019*2^6100018+1 interim residue 7EF55A8F1C9B7D42 at bit 5000
237019*2^6100018+1 interim residue 01002FB5DED459DC at bit 5001
237019*2^6100018+1 interim residue 12674737722E5123 at bit 6000
237019*2^6100018+1 interim residue 7541808760605DC7 at bit 6001
[/code]
Interestingly enough, this run would seem to agree with my earlier one without -oErrorCheck. The only one that [I]doesn't[/I] agree is your run with 0.57.

To verify whether my run or yours (or neither!) was the correct one, I also ran a similar test using LLR 3.8.5 on a CPU:
[code]
$ ./cllr.exe -d -q237019*2^6100018+1 -oInterimResidues=1000
Starting Proth prime test of 237019*2^6100018+1
Using all-complex Core2 type-3 FFT length 576K, Pass1=768, Pass2=768, a = 3
237019*2^6100018+1 interim residue C62C21499E54D684 at bit 1000
237019*2^6100018+1 interim residue 625A6248336F4C8B at bit 1001
237019*2^6100018+1 interim residue FEDB3F798612F804 at bit 2000
237019*2^6100018+1 interim residue 604B813A71B7C223 at bit 2001
237019*2^6100018+1 interim residue 9CE5D33781C59C5B at bit 3000
237019*2^6100018+1 interim residue 494344DB7C530074 at bit 3001
237019*2^6100018+1 interim residue C94D8FF4BC33622B at bit 4000
237019*2^6100018+1 interim residue 959388D7E56FC990 at bit 4001
237019*2^6100018+1 interim residue 7EF55A8F1C9B7D42 at bit 5000
237019*2^6100018+1 interim residue 01002FB5DED459DC at bit 5001
237019*2^6100018+1 interim residue 12674737722E5123 at bit 6000
237019*2^6100018+1 interim residue 7541808760605DC7 at bit 6001
237019*2^6100018+1 interim residue C0FC09DFE7BDA651 at bit 7000
237019*2^6100018+1 interim residue 12FAF6191C09A16F at bit 7001
237019*2^6100018+1 interim residue 9E56883D47A937CB at bit 8000
237019*2^6100018+1 interim residue DD45B6D93EDF3125 at bit 8001
237019*2^6100018+1 interim residue 1EE03FF8859BE7E7 at bit 9000
237019*2^6100018+1 interim residue BD46F256E47E476A at bit 9001
237019*2^6100018+1 interim residue A3BFDF08B2813649 at bit 10000 23.748 ms.
[/code]
So it looks like my run was the correct one. (Might the error in your test be due to the bug you fixed in 0.59?)

I'm now going to try running an entire test for 237019*2^6100018+1 using 0.59. We shall see if the final result matches ltd's. :smile:

BTW, I see that this new version chooses the 2097152 FFT, versus 1048576 as before. Is this just to fix the InterimResidues bug, or is this actually needed to ensure integrity of a test this size?

msft 2011-02-22 01:29

[QUOTE=mdettweiler;253295]
BTW, I see that this new version chooses the 2097152 FFT, versus 1048576 as before. Is this just to fix the InterimResidues bug, or is this actually needed to ensure integrity of a test this size?[/QUOTE]
We can get information from llr.
[CODE]
$ ./llr -d -q237019*2^6100018+1
Resuming Proth prime test of 237019*2^6100018+1 at bit 43 [0.00%]
Using all-complex Core2 type-3 FFT length 576K, Pass1=768, Pass2=768, a = 3
[/CODE]
FFT length over 512K,We need 2097152 FFT.


All times are UTC. The time now is 13:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.