mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

chalsall 2013-12-03 21:22

[QUOTE=flashjh;361070]If you have the code working for Linux, can you commit/merge it with the changes on SourceForge so I can take a look at it on Windows?[/QUOTE]

Just putting this out there for thought...

If you're not having fun, perhaps you should be doing something different.

Clearly we're having fun here, even if some don't understand the interest, the work, or the humor....

owftheevil 2013-12-03 22:31

[QUOTE=chappjc;361071]From "GeForce GTX 580 fft.txt":

[CODE] 2048 38492887 2.9761
2160 40551479 3.5742
2240 42020509 3.6679
2304 43194913 3.6846
2592 48471289 3.9861
2880 53735041 4.6150
3072 57237889 4.9730
3136 58404433 4.9740[/CODE]Do you want to see the full output from -cufftbench 2592 2592 6?[/QUOTE]

Yes, that would be useful.

chappjc 2013-12-03 23:07

1 Attachment(s)
Attached is the output of [FONT="Courier New"]CUDALucas -cufftbench 2592 2592 6[/FONT]. I don't get what it means by "best time" as it seems unrelated to the "ave time" values reported for the different threads.

owftheevil 2013-12-04 14:43

Thanks for those results. I'm still perplexed.

The first 36 lines are only timing the two normalization kernels which are the only things that depend on the thread values being varied. The last six lines are testing a full LL iteration with the two normalization kernels, the multiplication kernel, and two ffts.

flashjh 2013-12-08 21:11

CUDALucas 2.05 Beta r52 posted to sourceforge. New Windows executables are [URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/"]here[/URL].

The code to exit when one of those fft hangs occurs is deleted. The problem is that windows resets the driver after the timeout error and the code needs to wait and then check to see if everything is ready to go.

[B]Just a headsup, the timing is now handled a little differently, so tests resumed from old savefiles will give incorrect ETAs.[/B]

There is also now a simple checksum to verify the disk data, rather than the old, "does the save file have the prime q in the correct location" method of verification. So that old savefiles can be used with this new format, it doesn't enforce this yet but does give a warning that the checksums don't match.

Other changes:
1. overflow error checking
2. consolidated device momory allocations, reduces amount used slightly
3. tighter fft selection
4. better error handling
5. method for thread testing a range instead of just a single fft (eg ./CUDALucas -cufftbench 8192 1 5, end of range first)
6. put the ffts back into threads test, slower but much more accurate results on cards used for display

Please test and post results. Anyone have any verified mismatches with r50, please post and if you get any with this version, let us know. Thanks!

ET_ 2013-12-08 21:24

[QUOTE=flashjh;361490]CUDALucas 2.05 Beta r52 posted to sourceforge. New Windows executables are [URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/"]here[/URL].

The code to exit when one of those fft hangs occurs is deleted. The problem is that windows resets the driver after the timeout error and the code needs to wait and then check to see if everything is ready to go.

[B]Just a headsup, the timing is now handled a little differently, so tests resumed from old savefiles will give incorrect ETAs.[/B]

There is also now a simple checksum to verify the disk data, rather than the old, "does the save file have the prime q in the correct location" method of verification. So that old savefiles can be used with this new format, it doesn't enforce this yet but does give a warning that the checksums don't match.

Other changes:
1. overflow error checking
2. consolidated device momory allocations, reduces amount used slightly
3. tighter fft selection
4. better error handling
5. method for thread testing a range instead of just a single fft (eg ./CUDALucas -cufftbench 8192 1 5, end of range first)
6. put the ffts back into threads test, slower but much more accurate results on cards used for display

Please test and post results. Anyone have any verified mismatches with r50, please post and if you get any with this version, let us know. Thanks![/QUOTE]

Is this a Windows-only update?

Luigi

flashjh 2013-12-08 22:12

No, it applies to Linux, also. I requested a Linux file for SourceForge, but if you can compile it, you the updates need to be tested. Thanks.

flashjh 2013-12-10 23:44

I'm still getting the stop error and the batch file needs to keep CUDALucas going.

This is the code identified for the error:[CODE][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]
void[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] reset_err([/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]float[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2]* [/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#808080][FONT=Consolas][SIZE=2][COLOR=#808080][FONT=Consolas][SIZE=2][COLOR=#808080]maxerr[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2], [/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]float[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#808080][FONT=Consolas][SIZE=2][COLOR=#808080][FONT=Consolas][SIZE=2][COLOR=#808080]value[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2])
{
[/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#6f008a][FONT=Consolas][SIZE=2][COLOR=#6f008a][FONT=Consolas][SIZE=2][COLOR=#6f008a]cutilSafeCall[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] (cudaMemset (g_err, 0, [/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]sizeof[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] ([/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]float[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2])));
*[/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#808080][FONT=Consolas][SIZE=2][COLOR=#808080][FONT=Consolas][SIZE=2][COLOR=#808080]maxerr[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] *= [/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#808080][FONT=Consolas][SIZE=2][COLOR=#808080][FONT=Consolas][SIZE=2][COLOR=#808080]value[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2];
}
[/SIZE][/FONT][/SIZE][/FONT][/CODE] This is the screen output:[CODE]Using threads: norm1 128, mult 256, norm2 128.
C:/CUDA/CuLu/src/CUDALucas.cu(543) : cufftSafeCall() CUFFT error 9999: CUFFT Unknown error code[/CODE]

kladner 2013-12-10 23:48

[QUOTE] Code:
Using threads: norm1 128, mult 256, norm2 128. C:/CUDA/CuLu/src/CUDALucas.cu(543) : cufftSafeCall() CUFFT error 9999: CUFFT Unknown error code
[/QUOTE]

Interesting. The error has changed, at least from the one I saw when the program quit for me.

flashjh 2013-12-10 23:58

It's different now because owftheevil made changes to the code. Still seeing if we can get the program the catch and clear the fault without exiting on Windows.

mognuts 2013-12-13 20:38

I'm getting this error, if it's of any use to anybody:
[CODE]D:\Cuda\CUDALucas>CUDALucas_205Beta_x64_r52.exe -cufftbench 1 8192 5
------- DEVICE 0 -------
name GeForce GTX 570
Compatibility 2.0
clockRate (MHz) 1464
memClockRate (MHz) 1900
totalGlobalMem 1342177280
totalConstMem 65536
l2CacheSize 655360
sharedMemPerBlock 49152
regsPerBlock 32768
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsPerMP 1536
multiProcessorCount 15
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 65535,65535,65535
textureAlignment 512
deviceOverlap 1
CUDA bench, testing reasonable fft sizes 1K to 8192K, doing 5 passes.
fft size = 1K, ave time = 0.0273 msec, max-ave = 0.00060
fft size = 2K, ave time = 0.0329 msec, max-ave = 0.00684
fft size = 3K, ave time = 0.0716 msec, max-ave = 0.00737
fft size = 4K, ave time = 0.0540 msec, max-ave = 0.00778
fft size = 5K, ave time = 0.0529 msec, max-ave = 0.00765
fft size = 6K, ave time = 0.0525 msec, max-ave = 0.00007
fft size = 7K, ave time = 0.1198 msec, max-ave = 0.00315
fft size = 8K, ave time = 0.0514 msec, max-ave = 0.00005
fft size = 9K, ave time = 0.0540 msec, max-ave = 0.00015
fft size = 10K, ave time = 0.0605 msec, max-ave = 0.00526
fft size = 12K, ave time = 0.0639 msec, max-ave = 0.00018
fft size = 14K, ave time = 0.0599 msec, max-ave = 0.00308
fft size = 15K, ave time = 0.1332 msec, max-ave = 0.00249
fft size = 16K, ave time = 0.0682 msec, max-ave = 0.00304
fft size = 18K, ave time = 0.0624 msec, max-ave = 0.00113
fft size = 20K, ave time = 0.0726 msec, max-ave = 0.00299
fft size = 21K, ave time = 0.0738 msec, max-ave = 0.00237
fft size = 24K, ave time = 0.0891 msec, max-ave = 0.00284
fft size = 25K, ave time = 0.1378 msec, max-ave = 0.00328
fft size = 27K, ave time = 0.1417 msec, max-ave = 0.00018
fft size = 28K, ave time = 0.0928 msec, max-ave = 0.00369
fft size = 30K, ave time = 0.0948 msec, max-ave = 0.00302
fft size = 32K, ave time = 0.0824 msec, max-ave = 0.00008
fft size = 35K, ave time = 0.1550 msec, max-ave = 0.00241
fft size = 36K, ave time = 0.0995 msec, max-ave = 0.00247
fft size = 40K, ave time = 0.1051 msec, max-ave = 0.00247
fft size = 42K, ave time = 0.1085 msec, max-ave = 0.00206
fft size = 45K, ave time = 0.1684 msec, max-ave = 0.00200
fft size = 48K, ave time = 0.1081 msec, max-ave = 0.00262
fft size = 49K, ave time = 0.1212 msec, max-ave = 0.00285
fft size = 50K, ave time = 0.1188 msec, max-ave = 0.00167
fft size = 54K, ave time = 0.1316 msec, max-ave = 0.00317
fft size = 56K, ave time = 0.1183 msec, max-ave = 0.00104
fft size = 60K, ave time = 0.1417 msec, max-ave = 0.00308
fft size = 63K, ave time = 0.1869 msec, max-ave = 0.00165
fft size = 64K, ave time = 0.1429 msec, max-ave = 0.00278
fft size = 70K, ave time = 0.1678 msec, max-ave = 0.00228
fft size = 72K, ave time = 0.1714 msec, max-ave = 0.00192
fft size = 75K, ave time = 0.2364 msec, max-ave = 0.00292
fft size = 80K, ave time = 0.1697 msec, max-ave = 0.00258
fft size = 81K, ave time = 0.1969 msec, max-ave = 0.00242
fft size = 84K, ave time = 0.1873 msec, max-ave = 0.00353
fft size = 90K, ave time = 0.1956 msec, max-ave = 0.00296
fft size = 96K, ave time = 0.1912 msec, max-ave = 0.00261
fft size = 98K, ave time = 0.2060 msec, max-ave = 0.00251
fft size = 100K, ave time = 0.2082 msec, max-ave = 0.00247
fft size = 105K, ave time = 0.2809 msec, max-ave = 0.01314
fft size = 108K, ave time = 0.2220 msec, max-ave = 0.00268
fft size = 112K, ave time = 0.2066 msec, max-ave = 0.00269
fft size = 120K, ave time = 0.2396 msec, max-ave = 0.00223
fft size = 125K, ave time = 0.3224 msec, max-ave = 0.00303
fft size = 126K, ave time = 0.2600 msec, max-ave = 0.00272
fft size = 128K, ave time = 0.2473 msec, max-ave = 0.00267
fft size = 135K, ave time = 0.3416 msec, max-ave = 0.00243
fft size = 140K, ave time = 0.2864 msec, max-ave = 0.00152
fft size = 144K, ave time = 0.2645 msec, max-ave = 0.00264
fft size = 147K, ave time = 0.3659 msec, max-ave = 0.00392
fft size = 150K, ave time = 0.3193 msec, max-ave = 0.00333
fft size = 160K, ave time = 0.2903 msec, max-ave = 0.00386
fft size = 162K, ave time = 0.3330 msec, max-ave = 0.00242
fft size = 168K, ave time = 0.3331 msec, max-ave = 0.00439
fft size = 175K, ave time = 0.4022 msec, max-ave = 0.00344
fft size = 180K, ave time = 0.3385 msec, max-ave = 0.00578
fft size = 189K, ave time = 0.4385 msec, max-ave = 0.00424
fft size = 192K, ave time = 0.3540 msec, max-ave = 0.00371
fft size = 196K, ave time = 0.3763 msec, max-ave = 0.00530
fft size = 200K, ave time = 0.3905 msec, max-ave = 0.00511
fft size = 210K, ave time = 0.4171 msec, max-ave = 0.00389
fft size = 216K, ave time = 0.4135 msec, max-ave = 0.00383
fft size = 224K, ave time = 0.3805 msec, max-ave = 0.00748
fft size = 225K, ave time = 0.4789 msec, max-ave = 0.00789
fft size = 240K, ave time = 0.4466 msec, max-ave = 0.01557
fft size = 243K, ave time = 0.4917 msec, max-ave = 0.00647
fft size = 245K, ave time = 0.5389 msec, max-ave = 0.00815
fft size = 250K, ave time = 0.4767 msec, max-ave = 0.00844
fft size = 252K, ave time = 0.4824 msec, max-ave = 0.00267
fft size = 256K, ave time = 0.4456 msec, max-ave = 0.00454
fft size = 270K, ave time = 0.5332 msec, max-ave = 0.00474
fft size = 280K, ave time = 0.5253 msec, max-ave = 0.00931
fft size = 288K, ave time = 0.4752 msec, max-ave = 0.01467
fft size = 294K, ave time = 0.5797 msec, max-ave = 0.01844
fft size = 300K, ave time = 0.5838 msec, max-ave = 0.01188
fft size = 315K, ave time = 0.6671 msec, max-ave = 0.00862
fft size = 320K, ave time = 0.5398 msec, max-ave = 0.00571
fft size = 324K, ave time = 0.6093 msec, max-ave = 0.00350
fft size = 336K, ave time = 0.6200 msec, max-ave = 0.00447
fft size = 343K, ave time = 0.6894 msec, max-ave = 0.00486
fft size = 350K, ave time = 0.6783 msec, max-ave = 0.00658
fft size = 360K, ave time = 0.6460 msec, max-ave = 0.00605
fft size = 375K, ave time = 0.8148 msec, max-ave = 0.00743
fft size = 378K, ave time = 0.7359 msec, max-ave = 0.00680
fft size = 384K, ave time = 0.6703 msec, max-ave = 0.00187
fft size = 392K, ave time = 0.7014 msec, max-ave = 0.00381
fft size = 400K, ave time = 0.7023 msec, max-ave = 0.00418
fft size = 405K, ave time = 0.8098 msec, max-ave = 0.00206
C:/CUDA/CuLu/src/CUDALucas.cu(1877) : cudaSafeCall() Runtime API error 6: the launch timed out and was terminated.
C:/CUDA/CuLu/src/CUDALucas.cu(1886) : cudaSafeCall() Runtime API error 46: all CUDA-capable devices are busy or unavailable.[/CODE]


All times are UTC. The time now is 23:09.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.