![]() |
|
|
#2091 | |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
9,767 Posts |
Quote:
If you're not having fun, perhaps you should be doing something different. Clearly we're having fun here, even if some don't understand the interest, the work, or the humor.... |
|
|
|
|
|
|
#2092 | |
|
"Carl Darby"
Oct 2012
Spring Mountains, Nevada
4738 Posts |
Quote:
|
|
|
|
|
|
|
#2093 |
|
Jul 2007
2010 Posts |
Attached is the output of CUDALucas -cufftbench 2592 2592 6. I don't get what it means by "best time" as it seems unrelated to the "ave time" values reported for the different threads.
|
|
|
|
|
|
#2094 |
|
"Carl Darby"
Oct 2012
Spring Mountains, Nevada
32·5·7 Posts |
Thanks for those results. I'm still perplexed.
The first 36 lines are only timing the two normalization kernels which are the only things that depend on the thread values being varied. The last six lines are testing a full LL iteration with the two normalization kernels, the multiplication kernel, and two ffts. |
|
|
|
|
|
#2095 |
|
"Jerry"
Nov 2011
Vancouver, WA
112310 Posts |
CUDALucas 2.05 Beta r52 posted to sourceforge. New Windows executables are here.
The code to exit when one of those fft hangs occurs is deleted. The problem is that windows resets the driver after the timeout error and the code needs to wait and then check to see if everything is ready to go. Just a headsup, the timing is now handled a little differently, so tests resumed from old savefiles will give incorrect ETAs. There is also now a simple checksum to verify the disk data, rather than the old, "does the save file have the prime q in the correct location" method of verification. So that old savefiles can be used with this new format, it doesn't enforce this yet but does give a warning that the checksums don't match. Other changes: 1. overflow error checking 2. consolidated device momory allocations, reduces amount used slightly 3. tighter fft selection 4. better error handling 5. method for thread testing a range instead of just a single fft (eg ./CUDALucas -cufftbench 8192 1 5, end of range first) 6. put the ffts back into threads test, slower but much more accurate results on cards used for display Please test and post results. Anyone have any verified mismatches with r50, please post and if you get any with this version, let us know. Thanks! |
|
|
|
|
|
#2096 | |
|
Banned
"Luigi"
Aug 2002
Team Italia
2×3×11×73 Posts |
Quote:
Luigi |
|
|
|
|
|
|
#2097 |
|
"Jerry"
Nov 2011
Vancouver, WA
1,123 Posts |
No, it applies to Linux, also. I requested a Linux file for SourceForge, but if you can compile it, you the updates need to be tested. Thanks.
|
|
|
|
|
|
#2098 |
|
"Jerry"
Nov 2011
Vancouver, WA
1,123 Posts |
I'm still getting the stop error and the batch file needs to keep CUDALucas going.
This is the code identified for the error: Code:
void reset_err(float* maxerr, floatvalue)
{
cutilSafeCall (cudaMemset (g_err, 0, sizeof (float)));
*maxerr *= value;
}
Code:
Using threads: norm1 128, mult 256, norm2 128. C:/CUDA/CuLu/src/CUDALucas.cu(543) : cufftSafeCall() CUFFT error 9999: CUFFT Unknown error code |
|
|
|
|
|
#2099 | |
|
"Kieren"
Jul 2011
In My Own Galaxy!
2×3×1,693 Posts |
Quote:
|
|
|
|
|
|
|
#2100 |
|
"Jerry"
Nov 2011
Vancouver, WA
1,123 Posts |
It's different now because owftheevil made changes to the code. Still seeing if we can get the program the catch and clear the fault without exiting on Windows.
|
|
|
|
|
|
#2101 |
|
Sep 2008
Bromley, England
43 Posts |
I'm getting this error, if it's of any use to anybody:
Code:
D:\Cuda\CUDALucas>CUDALucas_205Beta_x64_r52.exe -cufftbench 1 8192 5 ------- DEVICE 0 ------- name GeForce GTX 570 Compatibility 2.0 clockRate (MHz) 1464 memClockRate (MHz) 1900 totalGlobalMem 1342177280 totalConstMem 65536 l2CacheSize 655360 sharedMemPerBlock 49152 regsPerBlock 32768 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 1024 maxThreadsPerMP 1536 multiProcessorCount 15 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 65535,65535,65535 textureAlignment 512 deviceOverlap 1 CUDA bench, testing reasonable fft sizes 1K to 8192K, doing 5 passes. fft size = 1K, ave time = 0.0273 msec, max-ave = 0.00060 fft size = 2K, ave time = 0.0329 msec, max-ave = 0.00684 fft size = 3K, ave time = 0.0716 msec, max-ave = 0.00737 fft size = 4K, ave time = 0.0540 msec, max-ave = 0.00778 fft size = 5K, ave time = 0.0529 msec, max-ave = 0.00765 fft size = 6K, ave time = 0.0525 msec, max-ave = 0.00007 fft size = 7K, ave time = 0.1198 msec, max-ave = 0.00315 fft size = 8K, ave time = 0.0514 msec, max-ave = 0.00005 fft size = 9K, ave time = 0.0540 msec, max-ave = 0.00015 fft size = 10K, ave time = 0.0605 msec, max-ave = 0.00526 fft size = 12K, ave time = 0.0639 msec, max-ave = 0.00018 fft size = 14K, ave time = 0.0599 msec, max-ave = 0.00308 fft size = 15K, ave time = 0.1332 msec, max-ave = 0.00249 fft size = 16K, ave time = 0.0682 msec, max-ave = 0.00304 fft size = 18K, ave time = 0.0624 msec, max-ave = 0.00113 fft size = 20K, ave time = 0.0726 msec, max-ave = 0.00299 fft size = 21K, ave time = 0.0738 msec, max-ave = 0.00237 fft size = 24K, ave time = 0.0891 msec, max-ave = 0.00284 fft size = 25K, ave time = 0.1378 msec, max-ave = 0.00328 fft size = 27K, ave time = 0.1417 msec, max-ave = 0.00018 fft size = 28K, ave time = 0.0928 msec, max-ave = 0.00369 fft size = 30K, ave time = 0.0948 msec, max-ave = 0.00302 fft size = 32K, ave time = 0.0824 msec, max-ave = 0.00008 fft size = 35K, ave time = 0.1550 msec, max-ave = 0.00241 fft size = 36K, ave time = 0.0995 msec, max-ave = 0.00247 fft size = 40K, ave time = 0.1051 msec, max-ave = 0.00247 fft size = 42K, ave time = 0.1085 msec, max-ave = 0.00206 fft size = 45K, ave time = 0.1684 msec, max-ave = 0.00200 fft size = 48K, ave time = 0.1081 msec, max-ave = 0.00262 fft size = 49K, ave time = 0.1212 msec, max-ave = 0.00285 fft size = 50K, ave time = 0.1188 msec, max-ave = 0.00167 fft size = 54K, ave time = 0.1316 msec, max-ave = 0.00317 fft size = 56K, ave time = 0.1183 msec, max-ave = 0.00104 fft size = 60K, ave time = 0.1417 msec, max-ave = 0.00308 fft size = 63K, ave time = 0.1869 msec, max-ave = 0.00165 fft size = 64K, ave time = 0.1429 msec, max-ave = 0.00278 fft size = 70K, ave time = 0.1678 msec, max-ave = 0.00228 fft size = 72K, ave time = 0.1714 msec, max-ave = 0.00192 fft size = 75K, ave time = 0.2364 msec, max-ave = 0.00292 fft size = 80K, ave time = 0.1697 msec, max-ave = 0.00258 fft size = 81K, ave time = 0.1969 msec, max-ave = 0.00242 fft size = 84K, ave time = 0.1873 msec, max-ave = 0.00353 fft size = 90K, ave time = 0.1956 msec, max-ave = 0.00296 fft size = 96K, ave time = 0.1912 msec, max-ave = 0.00261 fft size = 98K, ave time = 0.2060 msec, max-ave = 0.00251 fft size = 100K, ave time = 0.2082 msec, max-ave = 0.00247 fft size = 105K, ave time = 0.2809 msec, max-ave = 0.01314 fft size = 108K, ave time = 0.2220 msec, max-ave = 0.00268 fft size = 112K, ave time = 0.2066 msec, max-ave = 0.00269 fft size = 120K, ave time = 0.2396 msec, max-ave = 0.00223 fft size = 125K, ave time = 0.3224 msec, max-ave = 0.00303 fft size = 126K, ave time = 0.2600 msec, max-ave = 0.00272 fft size = 128K, ave time = 0.2473 msec, max-ave = 0.00267 fft size = 135K, ave time = 0.3416 msec, max-ave = 0.00243 fft size = 140K, ave time = 0.2864 msec, max-ave = 0.00152 fft size = 144K, ave time = 0.2645 msec, max-ave = 0.00264 fft size = 147K, ave time = 0.3659 msec, max-ave = 0.00392 fft size = 150K, ave time = 0.3193 msec, max-ave = 0.00333 fft size = 160K, ave time = 0.2903 msec, max-ave = 0.00386 fft size = 162K, ave time = 0.3330 msec, max-ave = 0.00242 fft size = 168K, ave time = 0.3331 msec, max-ave = 0.00439 fft size = 175K, ave time = 0.4022 msec, max-ave = 0.00344 fft size = 180K, ave time = 0.3385 msec, max-ave = 0.00578 fft size = 189K, ave time = 0.4385 msec, max-ave = 0.00424 fft size = 192K, ave time = 0.3540 msec, max-ave = 0.00371 fft size = 196K, ave time = 0.3763 msec, max-ave = 0.00530 fft size = 200K, ave time = 0.3905 msec, max-ave = 0.00511 fft size = 210K, ave time = 0.4171 msec, max-ave = 0.00389 fft size = 216K, ave time = 0.4135 msec, max-ave = 0.00383 fft size = 224K, ave time = 0.3805 msec, max-ave = 0.00748 fft size = 225K, ave time = 0.4789 msec, max-ave = 0.00789 fft size = 240K, ave time = 0.4466 msec, max-ave = 0.01557 fft size = 243K, ave time = 0.4917 msec, max-ave = 0.00647 fft size = 245K, ave time = 0.5389 msec, max-ave = 0.00815 fft size = 250K, ave time = 0.4767 msec, max-ave = 0.00844 fft size = 252K, ave time = 0.4824 msec, max-ave = 0.00267 fft size = 256K, ave time = 0.4456 msec, max-ave = 0.00454 fft size = 270K, ave time = 0.5332 msec, max-ave = 0.00474 fft size = 280K, ave time = 0.5253 msec, max-ave = 0.00931 fft size = 288K, ave time = 0.4752 msec, max-ave = 0.01467 fft size = 294K, ave time = 0.5797 msec, max-ave = 0.01844 fft size = 300K, ave time = 0.5838 msec, max-ave = 0.01188 fft size = 315K, ave time = 0.6671 msec, max-ave = 0.00862 fft size = 320K, ave time = 0.5398 msec, max-ave = 0.00571 fft size = 324K, ave time = 0.6093 msec, max-ave = 0.00350 fft size = 336K, ave time = 0.6200 msec, max-ave = 0.00447 fft size = 343K, ave time = 0.6894 msec, max-ave = 0.00486 fft size = 350K, ave time = 0.6783 msec, max-ave = 0.00658 fft size = 360K, ave time = 0.6460 msec, max-ave = 0.00605 fft size = 375K, ave time = 0.8148 msec, max-ave = 0.00743 fft size = 378K, ave time = 0.7359 msec, max-ave = 0.00680 fft size = 384K, ave time = 0.6703 msec, max-ave = 0.00187 fft size = 392K, ave time = 0.7014 msec, max-ave = 0.00381 fft size = 400K, ave time = 0.7023 msec, max-ave = 0.00418 fft size = 405K, ave time = 0.8098 msec, max-ave = 0.00206 C:/CUDA/CuLu/src/CUDALucas.cu(1877) : cudaSafeCall() Runtime API error 6: the launch timed out and was terminated. C:/CUDA/CuLu/src/CUDALucas.cu(1886) : cudaSafeCall() Runtime API error 46: all CUDA-capable devices are busy or unavailable. |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Don't DC/LL them with CudaLucas | LaurV | Data | 131 | 2017-05-02 18:41 |
| CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 | Brain | GPU Computing | 13 | 2016-02-19 15:53 |
| CUDALucas: which binary to use? | Karl M Johnson | GPU Computing | 15 | 2015-10-13 04:44 |
| settings for cudaLucas | fairsky | GPU Computing | 11 | 2013-11-03 02:08 |
| Trying to run CUDALucas on Windows 8 CP | Rodrigo | GPU Computing | 12 | 2012-03-07 23:20 |