mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

LaurV 2012-03-12 18:01

[QUOTE=flashjh;292731]Attached v1.65 x64 binaries (untested): [LIST][*]CUDA 4.0 / SM 2.0[*]CUDA 4.1 / SM 2.0[*]CUDA 4.1 / SM 2.1[/LIST]EDIT: Just tried running 1.65 4.1 | 2.0 and it quit right after displaying the inital startup stuff. I switched back to 1.64 because I have to go to work.[/QUOTE]
Up and running, thanks both of you msft and Jerry. Did all possible testing combinations for threads, the fastest on gtx580 is the one with 512. The 1024 brings a small penalty, no idea why, theoretically the threads would also queue for 512, despite of the fact that there are 512 cores, they would never work all in the same time for CL only.

The -agressive switch is ok, works perfectly it brings a bit of more speed (as in 15% more!!) but the computer is less responsive (as argued before).

I love the default variant (polite), is slower, but the computer is more responsive and at least Mrs LaurV can write her mails... :P so this part of headache is gone... :smile:. So, because of that, we will ignore the fact that the spelling of aggressive is wrong :P (anyhow, "-a" would suffice too)

We started the testing of 45221537 (see discussion before) on TWO cards (cheap gtx580, 1.5G mem, 782MHz clock, no OC).

[B]We love the output format![/B] (it shows error=, real time=, eta=, 4 decimals for ms/iter, wonderful!)

[B]We love the [-s backup] switch[/B], we can arrange the things as we like in the folder now. It is working, we tried it.

[B]We love the speed[/B], we get between 4.5 and 5.1 ms/iter, without -t, and with 512 threads, with the default FFT size. That is faster then before, where we got 5.3-5.6 ms/iter, average.

[B]We don't love the fact that [/B][B]older checkpoints are not compatible with the new one[/B]. That is why the test had to be restarted from scratch (it was close to finish, maybe 10-20 hours to go, now we need to wait again about 60 hours or more). This is minor, there will be not so many cases where one will restart "old tests". If you have old tests running, better let them finish before update, if they are say, half through. Otherwise is worth to restart, v1.65 is soooo much nicer!

[B]There is not possible to test small expos anymore[/B]. We don't love that either, but thinking about the fact we only need to test big expos... well... to take the billion digit prize, say... that is satisfactory :smile:

[B][COLOR=Red]We don't love -f switch[/COLOR][/B], because we don't know what values are allowed. Some documentation would be nice, if not all values are accepted. We understand the "use it on your risk" idea, but don't like programs crashing... We tried to use random values, smaller then default, based on the idea that the error for this expo is 0.07 (for the default FFT size of 2621440 of this expo), we consider that a bit smaller FFT, one for which the error could go to 0.1 or even 0.2, or there around, will speed-up the things a dime, but all the values we tried resulted in CL crashing with "unhandled exception, please report to microsoft".

The good news are that since we started to write this mail, we got 30 rows of text in each window, and all residue matching with what we have saved in previous run with 1.64. We think we will stop one card and give her some mfaktc to do, and let only one to finish this expo.

msft 2012-03-12 19:37

[QUOTE=LaurV;292756][B][COLOR=Red]We don't love -f switch[/COLOR][/B], because we don't know what values are allowed. Some documentation would be nice, if not all values are accepted.[/QUOTE]
multiples 32768(threads=256)
multiples 65536(threads=512)
multiples 131072(threads=1024)

flashjh 2012-03-13 01:38

From CUDALucas.cu: smallest exponent is now 6,972,593

[CODE]
if (q < 6972593)
printf (" too small Exponent %d\n", q);
[/CODE]

msft 2012-03-13 03:09

[QUOTE=Karl M Johnson;292742]Why?
CUDALucas no longer accepts small exponents?[/QUOTE]
[code]
normalize2_kernel <<< N / threads / 128, 128 >>> (g_x, threads, bigAB, bigAB, g_err, g_carry, N, error_log, g_inv, g_ttp, g_ttmp, g_inv2, g_ttp2, g_ttmp2, g_inv3, g_ttp3, g_ttmp3);
[/code]
threads = 1024
1024 * 128 = 131072
131072 is min fft length.

msft 2012-03-13 03:14

[QUOTE=apsen;292627]That does not match first time test. I guess I better rerun it with P95.[/QUOTE]
[code]
Verified test results
Exponent User name Computer name Residue Date found
29198173 msft Manual testing 6FD7E4D6557F5B77 2012-03-12 00:48
29198173 msft 6FD7E4D6557F5B77 2012-03-13 02:50
[/code]
What on your mind?

Dubslow 2012-03-13 03:16

[QUOTE=msft;292827][code]
normalize2_kernel <<< N / threads / 128, 128 >>> (g_x, threads, bigAB, bigAB, g_err, g_carry, N, error_log, g_inv, g_ttp, g_ttmp, g_inv2, g_ttp2, g_ttmp2, g_inv3, g_ttp3, g_ttmp3);
[/code]
threads = 1024
1024 * 128 = 131072
131072 is min fft length.[/QUOTE]

Two suggestions:

1) Check thread count before checking exponent (so that if threads =512, you can do a 512*64K FFT, or a 256*32K FFT for 256 threads).
1b) Select total number of threads after getting exponent to test (perhaps a warning about low GPU utilization)

2) Even if 1024 threads is selected, you can just continue the test anyways (but perhaps warn the user that below a certain threshold the efficiency will massively drop).

(Obviously how these choices would interact with a manually selected FFT size or thread count would have to be figured out, but this is just to get the ball rolling.)


[QUOTE=msft;292830][code]
Verified test results
Exponent User name Computer name Residue Date found
29198173 msft Manual testing 6FD7E4D6557F5B77 2012-03-12 00:48
29198173 msft 6FD7E4D6557F5B77 2012-03-13 02:50
[/code]
What on your mind?[/QUOTE]
Were those both done on GPU, or was one done on Prime95? (When apsen first posted his reply, only one of your tests was visible, so no one was able to tell that there was a match.)

flashjh 2012-03-13 04:33

[QUOTE=flashjh;292731]EDIT: Just tried running 1.65 4.1 | 2.0 and it quit right after displaying the inital startup stuff. I switched back to 1.64 because I have to go to work.[/QUOTE]

I have everything running on 1.65 now. Who knows why it wouldn't work? I was in a hurry so I probably had a switch set wrong.

Thanks for the updates msft.

Karl M Johnson 2012-03-13 07:26

Ok, so now the first exponent to run DCs on is 6972593.
Okay:smile:
That's 2h here.

Karl M Johnson 2012-03-13 09:44

Used latest binaries, cuda 4.1, sm_20(thanks!)
[CODE]M( 6972593 )P, n = 393216, CUDALucas v1.65
[/CODE]

flashjh 2012-03-14 02:46

CL 1.65 success
 
1 Attachment(s)
[CODE]Processing result: M( 26071663 )C, 0x48620a8eaadcaeb7, n = 1572864, CUDALucas v1.65
LL test successfully completes double-check of M26071663
[/CODE]

EDIT: Attached full run .txt file with all results.

EDIT2: It's working really well now msft. Thanks everyone for all the work on this!

apsen 2012-03-14 13:50

[QUOTE=msft;292830][code]
Verified test results
Exponent User name Computer name Residue Date found
29198173 msft Manual testing 6FD7E4D6557F5B77 2012-03-12 00:48
29198173 msft 6FD7E4D6557F5B77 2012-03-13 02:50
[/code]
What on your mind?[/QUOTE]

I mean GIMPS reports this:

[CODE]Unverified LL 3F6F8AA0E00307__ by "Olaf Fiebig"[/CODE]


All times are UTC. The time now is 23:12.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.