mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

Dubslow 2012-09-26 20:29

[QUOTE=LaurV;312859]Remark that CL is a bit stupid when 1472/1504k is selected, it switches to a higher FFT for SOME of the expos in the range (only for SOME), which is not normal, if you use 256 threads (for 512 the FFT must be multiple of 64k and it may be normal). Anyhow, for 1472 and 1504, when they DO run, they are slower then 1536 and 1568.

Also remark that the best choice after 1600 is much higher, 1728, the other in between are really bad. To make it clear, I ended the log with a -cufftbench, and if you like you can convert them to k by dividing to 1024 (like 1474560 is 1440k, 1572864 is 1536k, etc).[/QUOTE]

See, your mistake is assuming that CL is any kind of smart :razz:

Like I've mentioned before, selection is just pick the smallest length from the list that's > exp/20, and the list was chosen from Prime95's jump tables. Clearly exp/20 isn't very good :razz: Even the test-for-too-small length appears to be too aggressive (i.e. the first iterations aren't very representative of the "average").

Since kladner reports your timings aren't as good on a 460, it's a reasonable conclusion that the FFTs vary significantly based on speed/threads/memory/etc. of the card in use; I'll add the lengths you mention to CL's list, if they aren't there already; and then in the README will be a paragraph about how the automatic selection really isn't optimal, and that the curious user should experiment to squeeze the most speed out of CL. (At least the optimal is close :razz:)

kladner 2012-09-27 13:53

Latest run GTX 460
 
Fortunately, this only ran a couple of hours before I caught it. 1440K had been selected, and the average error was in the range which eventually got stopped by -t on a previous expo. Too bad it's so close to the edge. In this case 1440K was faster.
These were run at Polite 90. I forgot to hit P on the first and didn't want to do it over.

Test results-
[CODE]Starting M27303xxx fft length = 1440K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration 100, average error = 0.18088, max error = 0.25000
Iteration 200, average error = 0.20465, max error = 0.24219
Iteration 300, average error = 0.21246, max error = 0.26367
Iteration 400, average error = 0.21672, max error = 0.25000
Iteration 500, average error = 0.22009, max error = 0.29102
Iteration 600, average error = 0.22095, max error = 0.24219
Iteration 700, average error = 0.22143, max error = 0.26563
Iteration 800, average error = 0.22165, max error = 0.25000
Iteration 900, average error = 0.22200, max error = 0.28125
Iteration 1000, average error = 0.22289 < 0.25 (max error = 0.26563), continuing test.
Iteration 1000 M( 27303xxx )C, 0xbfe1e2e63170f3b9, n = 1440K, CUDALucas v2.04 Beta err = 0.0000 (0:06 real, 5.9563 ms/iter, ETA 45:10:19)
Iteration 2000 M( 27303xxx )C, 0x7f6a3d1c91672f83, n = 1440K, CUDALucas v2.04 Beta err = 0.2578 (0:05 real, 4.6708 ms/iter, ETA 35:25:16)
Iteration 3000 M( 27303xxx )C, 0x658167a19034ff0c, n = 1440K, CUDALucas v2.04 Beta err = 0.2656 (0:04 real, 4.6578 ms/iter, ETA 35:19:18)
Iteration 4000 M( 27303xxx )C, 0xcfd036cabb908829, n = 1440K, CUDALucas v2.04 Beta err = 0.2813 (0:05 real, 4.6606 ms/iter, ETA 35:20:30)
Iteration 5000 M( 27303xxx )C, 0x69aae7e03823395c, n = 1440K, CUDALucas v2.04 Beta err = 0.2734 (0:05 real, 4.6578 ms/iter, ETA 35:19:08)

1536K
Iteration 4000 M( 27303xxx)C, 0xcfd036cabb908829, n = 1536K, CUDALucas v2.04 Beta err = 0.0547 (0:05 real, 4.9678 ms/iter, ETA 37:40:15)

1568K
Iteration 4000 M( 27303xxx )C, 0xcfd036cabb908829, n = 1568K, CUDALucas v2.04 Beta err = 0.0361 (0:05 real, 5.2011 ms/iter, ETA 39:26:25)[/CODE]

patrik 2012-09-27 15:42

What does this error mean?
 
I'm starting to learn to run CUDALucas and I get the following error message:

[CODE]C:\Users\Patrik Johansson\Documents\CUDALucas>CUDALucas-2.03-cuda4.0-sm_20-x86-64.exe -t worktodo.txt

Warning: No ini file detected. Using defaults for non-specified options.
Starting M33271093 fft length = 1835008
iteration = 1301 >= 1000 && err = 0.5 >= 0.35, fft length = 1835008, writing checkpoint file (because -t is enabled) and exiting.

C:\Users\Patrik Johansson\Documents\CUDALucas>CUDALucas-2.03-cuda4.0-sm_20-x86-64.exe -t worktodo.txt

Warning: No ini file detected. Using defaults for non-specified options.
Continuing work from a partial result of M33271093 fft length = 1835008 iteration = 1202
iteration = 2601 >= 1000 && err = 0.5 >= 0.35, fft length = 1835008, writing checkpoint file (because -t is enabled) and exiting.

C:\Users\Patrik Johansson\Documents\CUDALucas>[/CODE]

What does this mean? Do I have to manually select a different FFT size?

EDIT: Downloading CUDALucas.ini seems to have solved the problem.

kladner 2012-09-27 16:16

@ Patrik

Glad you got it fixed. If you want to take the trouble, you could run -cufftbench as described at the end of the ini file. It is quite possible that you can get noticeably better performance from selecting the FFT length. There is considerable discussion of this in the last couple of pages of this thread. Note that you can leave FFT=0 (auto) in the ini, and specify a custom length on the worktodo line for the expo. This is also described in the last section of the ini.

Dubslow 2012-09-27 17:01

[QUOTE=kladner;312935]Note that you can leave FFT=0 (auto) in the ini, and specify a custom length on the worktodo line for the expo. This is also described in the last section of the ini.[/QUOTE]

Actually, that's only in 2.04, and he's using 2.03. (Note the full FFT length is given, not the length/1024.)

kladner 2012-09-27 17:16

[QUOTE=Dubslow;312938]Actually, that's only in 2.04, and he's using 2.03. (Note the full FFT length is given, not the length/1024.)[/QUOTE]

Oopsy! :redface: Thanks for correcting that.

patrik 2012-09-27 18:09

Well, the error occured again. Does this mean that I should choose a different FFT size, or is it my hardware? The card I use (Gigabyte GTX 570) is overclocked by default, IIRC).

[CODE]C:\Users\Patrik Johansson\Documents\CUDALucas>CUDALucas-2.03-cuda4.0-sm_20-x86-6
4.exe -t worktodo.txt

Starting M33271093 fft length = 1835008
Iteration 10000 M( 33271093 )C, 0x5348d62b85363b87, n = 1835008, CUDALucas v2.03
err = 0.2031 (0:38 real, 3.7536 ms/iter, ETA 34:40:44)
Iteration 20000 M( 33271093 )C, 0xd261b237d0ea981a, n = 1835008, CUDALucas v2.03
err = 0.2188 (0:37 real, 3.7551 ms/iter, ETA 34:40:55)
Iteration 30000 M( 33271093 )C, 0x0d91040de48abe77, n = 1835008, CUDALucas v2.03
err = 0.2188 (0:38 real, 3.7991 ms/iter, ETA 35:04:42)
[---]
Iteration 200000 M( 33271093 )C, 0x165d9f1fb29fb1f4, n = 1835008, CUDALucas v2.0
3 err = 0.2188 (0:38 real, 3.7499 ms/iter, ETA 34:26:49)
Iteration 210000 M( 33271093 )C, 0x0fe59984a99d2238, n = 1835008, CUDALucas v2.0
3 err = 0.2188 (0:37 real, 3.7483 ms/iter, ETA 34:25:17)
iteration = 216901 >= 1000 && err = 0.5 >= 0.35, fft length = 1835008, writing c
heckpoint file (because -t is enabled) and exiting.

C:\Users\Patrik Johansson\Documents\CUDALucas>[/CODE]

Dubslow 2012-09-27 18:24

[QUOTE=patrik;312950]Well, the error occured again. Does this mean that I should choose a different FFT size, or is it my hardware? The card I use (Gigabyte GTX 570) is overclocked by default, IIRC).

[CODE]C:\Users\Patrik Johansson\Documents\CUDALucas>CUDALucas-2.03-cuda4.0-sm_20-x86-6
4.exe -t worktodo.txt

Starting M33271093 fft length = 1835008
Iteration 10000 M( 33271093 )C, 0x5348d62b85363b87, n = 1835008, CUDALucas v2.03
err = 0.2031 (0:38 real, 3.7536 ms/iter, ETA 34:40:44)
Iteration 20000 M( 33271093 )C, 0xd261b237d0ea981a, n = 1835008, CUDALucas v2.03
err = 0.2188 (0:37 real, 3.7551 ms/iter, ETA 34:40:55)
Iteration 30000 M( 33271093 )C, 0x0d91040de48abe77, n = 1835008, CUDALucas v2.03
err = 0.2188 (0:38 real, 3.7991 ms/iter, ETA 35:04:42)
[---]
Iteration 200000 M( 33271093 )C, 0x165d9f1fb29fb1f4, n = 1835008, CUDALucas v2.0
3 err = 0.2188 (0:38 real, 3.7499 ms/iter, ETA 34:26:49)
Iteration 210000 M( 33271093 )C, 0x0fe59984a99d2238, n = 1835008, CUDALucas v2.0
3 err = 0.2188 (0:37 real, 3.7483 ms/iter, ETA 34:25:17)
iteration = 216901 >= 1000 && err = 0.5 >= 0.35, fft length = 1835008, writing c
heckpoint file (because -t is enabled) and exiting.

C:\Users\Patrik Johansson\Documents\CUDALucas>[/CODE][/QUOTE]

Run the self test, which (IIRC) is -r.

If all those pass, then just increase the FFT length; otherwise, reduce the clock speed if you can. kladner et al. can be more helpful in determining if it's the card or not. (Personally, I highly doubt it, but you never know.)

kladner 2012-09-27 20:22

Hi Patrik,

Which Gigabyte card do you have? Mine is a GV-N570OC-13I, the 3 fan model. This is just curiosity on my part. I will be extremely interested in your progress with this card, whichever it is. I have yet to succeed in getting mine to run CuLu.

If you want/need to pursue testing beyond what Dubslow suggested there are a few possibilities: OCCT and MemtestG80 especially.

MemtestG80: [url]http://folding.stanford.edu/English/DownloadUtils#ntoc2[/url]

There are versions for Windows, Linux, and Mac.

OCCT (Windows only): [url]http://www.ocbase.com/index.php/download[/url]

For MemtestG80 I suggest a command line like this-
[CODE]memtestg80 -g 0 -b 1140 200[/CODE]
-g [N] sets GPU number starting from 0, if you have more than one.
-b bypasses a query if you want to send data from the tests to Stanford.
1140 is the largest amount of memory I got to run on my card (1.25GiB)
200 is the number of iterations which will be run. Your choice.

While the app is labeled as a bandwidth test, it also keeps a running total of errors detected.

OCCT is a stress tester with different tabs for GPU, CPU, and PSU. On the GPU tab check Full Screen and Error Check. You can play with the length of runtime. This will heat up the GPU on the same order as Fur Mark (a Lot!), but it provides monitoring so you can track it.

I hope you get things worked out. I have currently put my 570 back on mfaktc duty because I couldn't make CuLu work. It works great on my Gigabyte GTX 460

patrik 2012-09-28 12:09

I think we have identical cards: GV-N570OC-13I V2.0

It passes both MemtestG80 and the three minutes of OCCT I ran before raising temperatures made me worried.

The error in CUDALucas is not reproducible and happens at different iterations. Also in the self-test it fails at different exponents (but most often at M20996011).

kladner 2012-09-28 15:26

1 Attachment(s)
I just set up a test folder to try CuLu on the 570 again. It still quits the same way on the self-test (last few lines):
[CODE]Starting M2976221 fft length = 8000K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 8000K
Starting M2976221 fft length = 8192K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
CUDALucas.cu(159) : cufftSafeCall() CUFFT error 6: CUFFT_EXEC_FAILED[/CODE]It seems that this card did not get as far as M20996011 on this run. I'll give it a couple more tries.

EDIT: Did you get the error 6 result? "CUDALucas.cu(159) : cufftSafeCall() CUFFT error 6: CUFFT_EXEC_FAILED"

EDIT2: The attached text file shows the tail end of the next -r run. This time it got to M2976221 as well, but tested up to a ridiculous FFT length (5760K) at which point it reports
[CODE]Iteration 1000, average error = 0.00450 < 0.25 (max error = 0.00000), continuing test.
Iteration = 1701 >= 1000 && err = 0.5 >= 0.35, fft length = 5760K, writing checkpoint file (because -t is enabled) and exiting.[/CODE]I am at a loss. Back to mfaktc.


All times are UTC. The time now is 23:14.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.