mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-12-03, 21:22   #2091
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

9,767 Posts
Default

Quote:
Originally Posted by flashjh View Post
If you have the code working for Linux, can you commit/merge it with the changes on SourceForge so I can take a look at it on Windows?
Just putting this out there for thought...

If you're not having fun, perhaps you should be doing something different.

Clearly we're having fun here, even if some don't understand the interest, the work, or the humor....
chalsall is online now   Reply With Quote
Old 2013-12-03, 22:31   #2092
owftheevil
 
owftheevil's Avatar
 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada

4738 Posts
Default

Quote:
Originally Posted by chappjc View Post
From "GeForce GTX 580 fft.txt":

Code:
 2048   38492887   2.9761
 2160   40551479   3.5742
 2240   42020509   3.6679
 2304   43194913   3.6846
 2592   48471289   3.9861
 2880   53735041   4.6150
 3072   57237889   4.9730
 3136   58404433   4.9740
Do you want to see the full output from -cufftbench 2592 2592 6?
Yes, that would be useful.
owftheevil is offline   Reply With Quote
Old 2013-12-03, 23:07   #2093
chappjc
 
chappjc's Avatar
 
Jul 2007

2010 Posts
Default

Attached is the output of CUDALucas -cufftbench 2592 2592 6. I don't get what it means by "best time" as it seems unrelated to the "ave time" values reported for the different threads.
Attached Files
File Type: txt cufftbench_2592.txt (4.7 KB, 130 views)
chappjc is offline   Reply With Quote
Old 2013-12-04, 14:43   #2094
owftheevil
 
owftheevil's Avatar
 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada

32·5·7 Posts
Default

Thanks for those results. I'm still perplexed.

The first 36 lines are only timing the two normalization kernels which are the only things that depend on the thread values being varied. The last six lines are testing a full LL iteration with the two normalization kernels, the multiplication kernel, and two ffts.
owftheevil is offline   Reply With Quote
Old 2013-12-08, 21:11   #2095
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

112310 Posts
Default

CUDALucas 2.05 Beta r52 posted to sourceforge. New Windows executables are here.

The code to exit when one of those fft hangs occurs is deleted. The problem is that windows resets the driver after the timeout error and the code needs to wait and then check to see if everything is ready to go.

Just a headsup, the timing is now handled a little differently, so tests resumed from old savefiles will give incorrect ETAs.

There is also now a simple checksum to verify the disk data, rather than the old, "does the save file have the prime q in the correct location" method of verification. So that old savefiles can be used with this new format, it doesn't enforce this yet but does give a warning that the checksums don't match.

Other changes:
1. overflow error checking
2. consolidated device momory allocations, reduces amount used slightly
3. tighter fft selection
4. better error handling
5. method for thread testing a range instead of just a single fft (eg ./CUDALucas -cufftbench 8192 1 5, end of range first)
6. put the ffts back into threads test, slower but much more accurate results on cards used for display

Please test and post results. Anyone have any verified mismatches with r50, please post and if you get any with this version, let us know. Thanks!
flashjh is offline   Reply With Quote
Old 2013-12-08, 21:24   #2096
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

2×3×11×73 Posts
Default

Quote:
Originally Posted by flashjh View Post
CUDALucas 2.05 Beta r52 posted to sourceforge. New Windows executables are here.

The code to exit when one of those fft hangs occurs is deleted. The problem is that windows resets the driver after the timeout error and the code needs to wait and then check to see if everything is ready to go.

Just a headsup, the timing is now handled a little differently, so tests resumed from old savefiles will give incorrect ETAs.

There is also now a simple checksum to verify the disk data, rather than the old, "does the save file have the prime q in the correct location" method of verification. So that old savefiles can be used with this new format, it doesn't enforce this yet but does give a warning that the checksums don't match.

Other changes:
1. overflow error checking
2. consolidated device momory allocations, reduces amount used slightly
3. tighter fft selection
4. better error handling
5. method for thread testing a range instead of just a single fft (eg ./CUDALucas -cufftbench 8192 1 5, end of range first)
6. put the ffts back into threads test, slower but much more accurate results on cards used for display

Please test and post results. Anyone have any verified mismatches with r50, please post and if you get any with this version, let us know. Thanks!
Is this a Windows-only update?

Luigi
ET_ is offline   Reply With Quote
Old 2013-12-08, 22:12   #2097
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

1,123 Posts
Default

No, it applies to Linux, also. I requested a Linux file for SourceForge, but if you can compile it, you the updates need to be tested. Thanks.
flashjh is offline   Reply With Quote
Old 2013-12-10, 23:44   #2098
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

1,123 Posts
Default

I'm still getting the stop error and the batch file needs to keep CUDALucas going.

This is the code identified for the error:
Code:

 void reset_err(float* maxerr, floatvalue)
 {
 cutilSafeCall (cudaMemset (g_err, 0, sizeof (float)));
 *maxerr *= value;
 }
 
This is the screen output:
Code:
Using threads: norm1 128, mult 256, norm2 128.
C:/CUDA/CuLu/src/CUDALucas.cu(543) : cufftSafeCall() CUFFT error 9999: CUFFT Unknown error code
flashjh is offline   Reply With Quote
Old 2013-12-10, 23:48   #2099
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2×3×1,693 Posts
Default

Quote:
Code:
Using threads: norm1 128, mult 256, norm2 128. C:/CUDA/CuLu/src/CUDALucas.cu(543) : cufftSafeCall() CUFFT error 9999: CUFFT Unknown error code
Interesting. The error has changed, at least from the one I saw when the program quit for me.
kladner is offline   Reply With Quote
Old 2013-12-10, 23:58   #2100
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

1,123 Posts
Default

It's different now because owftheevil made changes to the code. Still seeing if we can get the program the catch and clear the fault without exiting on Windows.
flashjh is offline   Reply With Quote
Old 2013-12-13, 20:38   #2101
mognuts
 
mognuts's Avatar
 
Sep 2008
Bromley, England

43 Posts
Default

I'm getting this error, if it's of any use to anybody:
Code:
D:\Cuda\CUDALucas>CUDALucas_205Beta_x64_r52.exe -cufftbench 1 8192 5
 ------- DEVICE 0 -------
name                GeForce GTX 570
Compatibility       2.0
clockRate (MHz)     1464
memClockRate (MHz)  1900
totalGlobalMem      1342177280
totalConstMem       65536
l2CacheSize         655360
sharedMemPerBlock   49152
regsPerBlock        32768
warpSize            32
memPitch            2147483647
maxThreadsPerBlock  1024
maxThreadsPerMP     1536
multiProcessorCount 15
maxThreadsDim[3]    1024,1024,64
maxGridSize[3]      65535,65535,65535
textureAlignment    512
deviceOverlap       1
 CUDA bench, testing reasonable fft sizes 1K to 8192K, doing 5 passes.
fft size = 1K, ave time = 0.0273 msec, max-ave = 0.00060
fft size = 2K, ave time = 0.0329 msec, max-ave = 0.00684
fft size = 3K, ave time = 0.0716 msec, max-ave = 0.00737
fft size = 4K, ave time = 0.0540 msec, max-ave = 0.00778
fft size = 5K, ave time = 0.0529 msec, max-ave = 0.00765
fft size = 6K, ave time = 0.0525 msec, max-ave = 0.00007
fft size = 7K, ave time = 0.1198 msec, max-ave = 0.00315
fft size = 8K, ave time = 0.0514 msec, max-ave = 0.00005
fft size = 9K, ave time = 0.0540 msec, max-ave = 0.00015
fft size = 10K, ave time = 0.0605 msec, max-ave = 0.00526
fft size = 12K, ave time = 0.0639 msec, max-ave = 0.00018
fft size = 14K, ave time = 0.0599 msec, max-ave = 0.00308
fft size = 15K, ave time = 0.1332 msec, max-ave = 0.00249
fft size = 16K, ave time = 0.0682 msec, max-ave = 0.00304
fft size = 18K, ave time = 0.0624 msec, max-ave = 0.00113
fft size = 20K, ave time = 0.0726 msec, max-ave = 0.00299
fft size = 21K, ave time = 0.0738 msec, max-ave = 0.00237
fft size = 24K, ave time = 0.0891 msec, max-ave = 0.00284
fft size = 25K, ave time = 0.1378 msec, max-ave = 0.00328
fft size = 27K, ave time = 0.1417 msec, max-ave = 0.00018
fft size = 28K, ave time = 0.0928 msec, max-ave = 0.00369
fft size = 30K, ave time = 0.0948 msec, max-ave = 0.00302
fft size = 32K, ave time = 0.0824 msec, max-ave = 0.00008
fft size = 35K, ave time = 0.1550 msec, max-ave = 0.00241
fft size = 36K, ave time = 0.0995 msec, max-ave = 0.00247
fft size = 40K, ave time = 0.1051 msec, max-ave = 0.00247
fft size = 42K, ave time = 0.1085 msec, max-ave = 0.00206
fft size = 45K, ave time = 0.1684 msec, max-ave = 0.00200
fft size = 48K, ave time = 0.1081 msec, max-ave = 0.00262
fft size = 49K, ave time = 0.1212 msec, max-ave = 0.00285
fft size = 50K, ave time = 0.1188 msec, max-ave = 0.00167
fft size = 54K, ave time = 0.1316 msec, max-ave = 0.00317
fft size = 56K, ave time = 0.1183 msec, max-ave = 0.00104
fft size = 60K, ave time = 0.1417 msec, max-ave = 0.00308
fft size = 63K, ave time = 0.1869 msec, max-ave = 0.00165
fft size = 64K, ave time = 0.1429 msec, max-ave = 0.00278
fft size = 70K, ave time = 0.1678 msec, max-ave = 0.00228
fft size = 72K, ave time = 0.1714 msec, max-ave = 0.00192
fft size = 75K, ave time = 0.2364 msec, max-ave = 0.00292
fft size = 80K, ave time = 0.1697 msec, max-ave = 0.00258
fft size = 81K, ave time = 0.1969 msec, max-ave = 0.00242
fft size = 84K, ave time = 0.1873 msec, max-ave = 0.00353
fft size = 90K, ave time = 0.1956 msec, max-ave = 0.00296
fft size = 96K, ave time = 0.1912 msec, max-ave = 0.00261
fft size = 98K, ave time = 0.2060 msec, max-ave = 0.00251
fft size = 100K, ave time = 0.2082 msec, max-ave = 0.00247
fft size = 105K, ave time = 0.2809 msec, max-ave = 0.01314
fft size = 108K, ave time = 0.2220 msec, max-ave = 0.00268
fft size = 112K, ave time = 0.2066 msec, max-ave = 0.00269
fft size = 120K, ave time = 0.2396 msec, max-ave = 0.00223
fft size = 125K, ave time = 0.3224 msec, max-ave = 0.00303
fft size = 126K, ave time = 0.2600 msec, max-ave = 0.00272
fft size = 128K, ave time = 0.2473 msec, max-ave = 0.00267
fft size = 135K, ave time = 0.3416 msec, max-ave = 0.00243
fft size = 140K, ave time = 0.2864 msec, max-ave = 0.00152
fft size = 144K, ave time = 0.2645 msec, max-ave = 0.00264
fft size = 147K, ave time = 0.3659 msec, max-ave = 0.00392
fft size = 150K, ave time = 0.3193 msec, max-ave = 0.00333
fft size = 160K, ave time = 0.2903 msec, max-ave = 0.00386
fft size = 162K, ave time = 0.3330 msec, max-ave = 0.00242
fft size = 168K, ave time = 0.3331 msec, max-ave = 0.00439
fft size = 175K, ave time = 0.4022 msec, max-ave = 0.00344
fft size = 180K, ave time = 0.3385 msec, max-ave = 0.00578
fft size = 189K, ave time = 0.4385 msec, max-ave = 0.00424
fft size = 192K, ave time = 0.3540 msec, max-ave = 0.00371
fft size = 196K, ave time = 0.3763 msec, max-ave = 0.00530
fft size = 200K, ave time = 0.3905 msec, max-ave = 0.00511
fft size = 210K, ave time = 0.4171 msec, max-ave = 0.00389
fft size = 216K, ave time = 0.4135 msec, max-ave = 0.00383
fft size = 224K, ave time = 0.3805 msec, max-ave = 0.00748
fft size = 225K, ave time = 0.4789 msec, max-ave = 0.00789
fft size = 240K, ave time = 0.4466 msec, max-ave = 0.01557
fft size = 243K, ave time = 0.4917 msec, max-ave = 0.00647
fft size = 245K, ave time = 0.5389 msec, max-ave = 0.00815
fft size = 250K, ave time = 0.4767 msec, max-ave = 0.00844
fft size = 252K, ave time = 0.4824 msec, max-ave = 0.00267
fft size = 256K, ave time = 0.4456 msec, max-ave = 0.00454
fft size = 270K, ave time = 0.5332 msec, max-ave = 0.00474
fft size = 280K, ave time = 0.5253 msec, max-ave = 0.00931
fft size = 288K, ave time = 0.4752 msec, max-ave = 0.01467
fft size = 294K, ave time = 0.5797 msec, max-ave = 0.01844
fft size = 300K, ave time = 0.5838 msec, max-ave = 0.01188
fft size = 315K, ave time = 0.6671 msec, max-ave = 0.00862
fft size = 320K, ave time = 0.5398 msec, max-ave = 0.00571
fft size = 324K, ave time = 0.6093 msec, max-ave = 0.00350
fft size = 336K, ave time = 0.6200 msec, max-ave = 0.00447
fft size = 343K, ave time = 0.6894 msec, max-ave = 0.00486
fft size = 350K, ave time = 0.6783 msec, max-ave = 0.00658
fft size = 360K, ave time = 0.6460 msec, max-ave = 0.00605
fft size = 375K, ave time = 0.8148 msec, max-ave = 0.00743
fft size = 378K, ave time = 0.7359 msec, max-ave = 0.00680
fft size = 384K, ave time = 0.6703 msec, max-ave = 0.00187
fft size = 392K, ave time = 0.7014 msec, max-ave = 0.00381
fft size = 400K, ave time = 0.7023 msec, max-ave = 0.00418
fft size = 405K, ave time = 0.8098 msec, max-ave = 0.00206
C:/CUDA/CuLu/src/CUDALucas.cu(1877) : cudaSafeCall() Runtime API error 6: the launch timed out and was terminated.
C:/CUDA/CuLu/src/CUDALucas.cu(1886) : cudaSafeCall() Runtime API error 46: all CUDA-capable devices are busy or unavailable.
mognuts is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 21:04.


Sun Aug 1 21:04:05 UTC 2021 up 9 days, 15:33, 0 users, load averages: 1.75, 1.56, 1.52

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.