![]() |
[QUOTE=flashjh;359655]My fft.txt file says that Threads=512 256 256 is the best setting for me to use for the current FFT range I'm in, so I leave it there and that works for the -r test too.[/QUOTE]
Thanks, Jerry. That seems to complete the puzzle. "Threads=512 256 256" seems to have let me complete 'CUDALucas -cufftbench 1 8192 1' twice, when mostly it would not complete with 1024 1024 1024, or 256 256 256. I found the threads file which CUDAPm1 generated for my 580, and it agrees with your numbers. The card is still throttled way back. I'm going to see if it will run with at least the core clocked up a bit. EDIT: Partial correction is in order. Before the last few runs I also switched the display from the 580 to the GTX 570. This was at Antonio's suggestion, and also seemed to play a role in stabilizing the 580. EDIT2: I declared victory prematurely. The latest attempt with -r yielded-[CODE]E:\CUDA\2.05-BETA>CUDALucas -r ------- DEVICE 0 ------- name GeForce GTX 580 Compatibility 2.0 clockRate (MHz) 1564 memClockRate (MHz) 1600 totalGlobalMem 1610612736 totalConstMem 65536 l2CacheSize 786432 sharedMemPerBlock 49152 regsPerBlock 32768 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 1024 maxThreadsPerMP 1536 multiProcessorCount 16 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 65535,65535,65535 textureAlignment 512 deviceOverlap 1 Starting self test M86243 fft length = 4K Running careful round off test for 1000 iterations. If average error > 0.25, or maximum error > 0.35, the test will restart with a longer FFT. Iteration 100, average error = 0.15317, max error = 0.23438 Iteration 200, average error = 0.16318, max error = 0.24521 Iteration 300, average error = 0.16738, max error = 0.23047 Iteration 400, average error = 0.17024, max error = 0.25000 Iteration 500, average error = 0.17168, max error = 0.25000 Iteration 600, average error = 0.17195, max error = 0.23438 Iteration 700, average error = 0.17195, max error = 0.20947 Iteration 800, average error = 0.17240, max error = 0.21875 Iteration 900, average error = 0.17285, max error = 0.25000 Iteration 1000, average error = 0.17264 <= 0.25 (max error = 0.25000), continuing test. Iteration 10000 M( 86243 )C, 0x23992ccd735a03d9, n = 4K, CUDALucas v2.05 Beta err = 0.28125 (0:01 real, 0.0825 ms/iter) This residue is correct. The fft length 16K is too large for the exponent 132049. Restart with smaller fft.[/CODE] The above happens regardless of which card the monitor is connected to. |
[QUOTE=Antonio;359427]The -2GiB is reported by 2.05-Beta-x64, downloaded today.
The card has +2GiB of memory installed.[/QUOTE] My mistake. I fixed the code, but only put it into CUDAPm1. I put it into CUDALucas this weekend. That version of the code will be up soon. |
[QUOTE=kladner;359469]I am having a pretty rough time with 2.05 on the GTX 580.
'CUDALucas -cufftbench 1 8192 1' crashes on GTX 580, brings down graphic driver 327.23 (which restarts). 782 MHz core, 1600 VRAM This is just the latest test of many. Occasionally, the test completes. Rolling back the driver from 331.65 to 327.23 made no difference. 2.04-beta successfully completes 'CUDALucas -cufftbench 32768 3276800 32768' and has turned in good DCs at 830 MHz core, 1600 VRAM. I haven't yet tried running a DC on 2.05-beta. I have the card throttled back from where it normally runs mfaktc to stock: from 844 MHz to 782 MHz. The RAM is 400 MHz below stock. Any suggestions would be appreciated. EDIT: Tried running an exponent, 30651xxx on 2.05, 830 MHz core, 1600 VRAM. Started with 1728K, instead of stepping up to it from 1600K, as 2.04 did. Crashed a bit after the 40,000th iteration. [CODE]Iteration 10000 M( 30651671 )C, 0x6b79bd6d5adfb7de, n = 1728K, CUDALucas v2.05 Beta err = 0.05396 (0:26 real, 2.8857 ms/iter, ETA 24:33:42) Iteration 20000 M( 30651671 )C, 0x53064732900985e9, n = 1728K, CUDALucas v2.05 Beta err = 0.06055 (0:26 real, 2.6133 ms/iter, ETA 22:14:09) Iteration 30000 M( 30651671 )C, 0xe85abecfe0f40dce, n = 1728K, CUDALucas v2.05 Beta err = 0.05469 (0:26 real, 2.6123 ms/iter, ETA 22:13:12) Iteration 40000 M( 30651671 )C, 0xa4208cf27dd73713, n = 1728K, CUDALucas v2.05 Beta err = 0.06250 (0:26 real, 2.6123 ms/iter, ETA 22:12:48) CUDALucas.cu(310) : cudaSafeCall() Runtime API error 30: unknown error.[/CODE][/QUOTE] This is an Nvidia driver error. I used to think it only occured when the card was also driving the display. Recently I got this error on a 570 which was not driving the display (although it didn't report an error, it just hung indefinitely). It seems to have been introduced as of the 300+ drivers. |
[QUOTE=flashjh;359527]I just completed a DC on M57885161 with CUDALucas 2.05-Beta-x64. It completed without error and I even switched FFT sizes a few times. Since I have the full run of residues from the first time I ran it, I was able to check progress along the way.
The only issue I found so far was keyboard input. If Interactive=n is set to 1 in the .ini file then anytime I pressed a key the program would stop progress. GPU usage dropped to about 50% but ^c still stopped the run. I could restart with no problems. Anyone else seen this in Windows or Linux? Can some others test this to see if it's working or not in Windows and Linux? I haven't run all the FFT benchmarks yet, I'll do that now. Anyone else having a problem with the amount of memory reported by CUDALucas?[/QUOTE] I have seen this keyboard input problem. If I run cmd.exe (?? I think that's its name) keyboard input dosen't work. The other console program, whatever its called, does work with keyboard input. |
[QUOTE=kladner;359541]Another observation/question- should the savefiles of v 2.04beta be more than three times as large as those of v 2.05beta? .....EDIT: for the same exponent?[/QUOTE]
Yes they should be. |
[QUOTE=Manpowre;359641]
. . . Starting self test M43112609 fft length = 2304K Running careful round off test for 1000 iterations. If average error > 0.25, or maximum error > 0.35, the test will restart with a longer FFT. Iteration 100, average error = 0.17969, max error = 0.28125 Iteration 200, average error = 0.20398, max error = 0.26563 Iteration 300, average error = 0.21162, max error = 0.27344 Iteration 400, average error = 0.21489, max error = 0.28125 Iteration 500, average error = 0.21730, max error = 0.28125 Iteration 600, average error = 0.21847, max error = 0.26563 Iteration 700, average error = 0.21941, max error = 0.25781 Iteration 800, average error = 0.22026, max error = 0.25879 Iteration 900, average error = 0.22068, max error = 0.26172 Iteration 1000, average error = 0.22089 <= 0.25 (max error = 0.28125), continuin g test. Iteration 10000 M( 43112609 )C, 0x62871c7027ff12c8, n = 2304K, CUDALucas v2.05 B eta [COLOR=Olive]err = 0.50000[/COLOR] (0:21 real, 2.0989 ms/iter) Expected residue [e86891ebf6cd70c4] does not match actual residue [62871c7027ff1 2c8] . . . (tested on titan)[/QUOTE] Notice the round-off error. Which driver are you using? |
[QUOTE=flashjh;359647]An issue:
When running CUDALucas -r with a GeForce GTX --- fft.txt you [I]may[/I] get the error: [CODE]The fft length 32K is too large for the exponent 216091. Restart with smaller fft.[/CODE]Removing the file, as noted above, fixes the error. So when -cufftbench is run and the .txt file is gereated I presume the FFTs are tuned correctly. However, the new 'less tolerant' code won't accept those values for use in the self test. Also, can someone explain the updated threads in 2.05. Is it necessary to have .ini file threads anymore? Why three values instead of 1. What is the interaction with the new .txt file? Thanks[/QUOTE] Or instead of deleting the threads.txt file, insert a line with 16 as its only entry before the line with 32 on it. This is a lack of foresight on my part. Even though 32k ffts are faster than 16k or other smaller ffts big enough to handle 216091, some of those smaller ffts are still needed. I'll think about how to fix this. As for threads in the ini file, there are three kernels whose performance depends on the number of threads they are invoked with. 2.04 and earlier fixed the threads on two of them at 128, which is a good compromise. Those values should be the defaults in the ini file. I don't know how 1024 snuck its way in as the default.The values in threads.txt override the ini values. |
[QUOTE=owftheevil;359697]Notice the round-off error. Which driver are you using?[/QUOTE]
Latest nvidia 331.65 cudalucas 2.03 doesnt give me wrong residue. I took down clock on gpu with 200mhz, and it run fine again with 2.05. |
[QUOTE=Manpowre;359724]Latest nvidia 331.65
cudalucas 2.03 doesnt give me wrong residue. I took down clock on gpu with 200mhz, and it run fine again with 2.05.[/QUOTE] Good to hear. What clocks are you running at now? |
[QUOTE=owftheevil;359736]Good to hear. What clocks are you running at now?[/QUOTE]
780 mhz, stock memory clock, I put the residue test up again for 20 repeats an hour ago, so I will complete this, then take it back up to stock clock at 880 and retest 20 residue runs again. |
The new code is compiled and the windows binaries (release/debug) are posted on SourceForge.
@owftheevil: The -memtest functions, but something isn't right with the iterations. For example 56 1000 1 on my 580 says ETA 12181:18:07 :smile: I posted a working memtest.zip to [URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/?"]sourceforge[/URL] EDIT: Please only use 2.05 Beta .exe files for testing the code. It is not ready for production use yet. Thanks! |
| All times are UTC. The time now is 23:11. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.