mersenneforum.org CUDALucas: which binary to use?
 Register FAQ Search Today's Posts Mark Forums Read

 2015-10-12, 05:51 #1 Karl M Johnson     Mar 2010 3×137 Posts CUDALucas: which binary to use? Good morning. I have decided to do a small research regarding binary selection for CUDALucas. Downloaded the available binaries from Sourceforge, used the nearly default configuration file. I've used M43112609 for LL product calculation time, 2352K FFT size and 256 threads & splice, all running on 'ol GTX Titan. The average numbers of five different values were calculated: Code: C_4.2: 1.8538 C_5.0: 1.82502 C_5.5: 1.8279 C_6.0: 1.83506 C_6.5: 1.6982 What this means is, that for this particular exponent range (M43ish), the binary compiled with CUDA toolkit v6.5 performs best on sm_50 hardware. Now, same series of tests on M58496057 with 4320K FFT size: Code: C_4.2: 3.52024 C_5.0: 3.40014 C_5.5: 3.32784 C_6.0: 3.31762 C_6.5: 3.33362 The situation changes, as the binary compiled with CUDA toolkit 6.0 is the fastest on sm_50 hardware for 58Mish exp. range, though not by much. Miscellaneous observation(s): 1. For some reason, while C4.2 - C5.5 binaries are OK with a smaller starting FFT length, C6.0 and C6.5 binaries require bigger FFT sizes, and this is erratic behaviour always occurs. Why this is happening is beyond my knowledge of the topic. There is more to it than that: Code: Using threads: square 256, splice 256. Starting M58496057 fft length = 4320K Running careful round off test for 1000 iterations. If average error > 0.25, or maximum error > 0.35, the test will restart with a longer FFT. Iteration = 80 < 1000 && err = 0.50000 > 0.35, increasing n from 4320K The fft length 4608K is too large for exponent 58496057, decreasing to 4320K Using threads: square 256, splice 256. Starting M58496057 fft length = 4320K Running careful round off test for 1000 iterations. If average error > 0.25, or maximum error > 0.35, the test will restart with a longer FFT. Iteration 100, average error = 0.00021, max error = 0.00032 Iteration 200, average error = 0.00024, max error = 0.00033 Iteration 300, average error = 0.00025, max error = 0.00033 Iteration 400, average error = 0.00026, max error = 0.00032 Iteration 500, average error = 0.00026, max error = 0.00034 Iteration 600, average error = 0.00026, max error = 0.00032 Iteration 700, average error = 0.00027, max error = 0.00032 Iteration 800, average error = 0.00027, max error = 0.00032 Iteration 900, average error = 0.00027, max error = 0.00032 Iteration 1000, average error = 0.00027 <= 0.25 (max error = 0.00034), continuing test. Doesn't look right, does it, how it tries increasing FFT size because of error, then going back to it and running without problems? Some initialisation bug? 2. The situation may (and I have a feeling it will) be different for NV cards of other shader model. Tracking particular "golden" binaries for particular exponent isn't easy, and adding particular shader models into that makes it tougher. One day the developers of CUDALucas may have to consider maintaining only a single build of CUDALucas and deprecating the rest, thus "embracing progress". Comments, along with other CUDA builds ('specially 7.0 and 7.5), are welcome!
 2015-10-12, 06:08 #2 LaurV Romulan Interpreter     Jun 2011 Thailand 22·7·11·29 Posts That is nothing wrong with the program, only that your FFT is too big, for this expo I think a ~3M FFT would work better and faster than the 4M3 FFT you use. When I reach home I will check with my cudaLucas setup. [edit: the 3M is from estimation looking to your error size. You may need a bit higher than 3M. Generally, the best way (i.e. optimum and fast and safe to test) is when the error is around 0.2 (like from 0.15 to 0.25 according with your FFT selection). This FFT is definitely too big] Last fiddled with by LaurV on 2015-10-12 at 06:11
 2015-10-12, 06:31 #3 Karl M Johnson     Mar 2010 1100110112 Posts Indeed, I've done more tests and found out that this "bug" or whatever's happening here can't always be reproduced. Once the proper binary for any FFT size is picked, proper internal benchmarks should be done to find out the best FFT size/thread/splice combinations. Last fiddled with by Karl M Johnson on 2015-10-12 at 06:49 Reason: yes
 2015-10-12, 09:41 #4 henryzz Just call me Henry     "David" Sep 2007 Cambridge (GMT/BST) 22·3·479 Posts Sounds like an unstable card to me.
 2015-10-12, 11:19 #5 Karl M Johnson     Mar 2010 3·137 Posts Not unlikely, even though I don't recall ever submitting a bad LL test before migrating from 337.xx Forceware. As usual, needs more scientific testing. Any comments on the method? Any hints regarding CUDALucas and how it works with newer CUDA toolkit versions? Last fiddled with by Karl M Johnson on 2015-10-12 at 11:20 Reason: yes
2015-10-12, 15:06   #6
Serpentine Vermin Jar

Jul 2014

CCD16 Posts

Quote:
 Originally Posted by Karl M Johnson Not unlikely, even though I don't recall ever submitting a bad LL test before migrating from 337.xx Forceware. As usual, needs more scientific testing. Any comments on the method? Any hints regarding CUDALucas and how it works with newer CUDA toolkit versions?
I figured I'd throw out the obvious possibility, that the 6.5 version mentioned might have more conservative settings as it related to memory usage, perhaps? Thus the larger FFT being impacted in some way.

The basic code itself won't change with different FFT sizes, so it's only the way the program allocates memory which I think would be the big variable here, and 6.0 / 6.5 must not be entirely the same in that regard.

I've never used any of the cuda compilers so I couldn't be more specific, but I'd look to see if any default options have changed, especially as it relates to the memory aspect. Like maybe 6.5 had some build option to do something to have it spend a little more time in garbage collection or something weird... something that would affect a larger memory chunk more than a smaller one. Or maybe something in the way it allocates memory differently, etc. etc.

Last fiddled with by Madpoo on 2015-10-12 at 15:07

 2015-10-12, 16:37 #7 Karl M Johnson     Mar 2010 3×137 Posts Okay, I thought the hardware is unstable, it may lose overclocking potential, but as it turns out, it's not entirely related to that. So far I've gotten no bad residues on C4.2 binaries, but this doesn't mean anything yet. (5.5)C6.0-C6.5 could indeed be more stability-demanding, even if previous versions worked flawlessly for years. Will report my further findings.
2015-10-12, 16:53   #8
chalsall
If I May

"Chris Halsall"
Sep 2002

9,323 Posts

Quote:
 Originally Posted by Karl M Johnson Will report my further findings.
"The most exciting phrase to hear in science, the one that heralds new discoveries, is not 'Eureka!' but 'That's funny...'" -- Isaac Asimov

2015-10-12, 17:37   #9
frmky

Jul 2003
So Cal

205210 Posts

Quote:
 Originally Posted by Karl M Johnson 2. The situation may (and I have a feeling it will) be different for NV cards of other shader model. Tracking particular "golden" binaries for particular exponent isn't easy, and adding particular shader models into that makes it tougher. One day the developers of CUDALucas may have to consider maintaining only a single build of CUDALucas and deprecating the rest, thus "embracing progress".
I don't spend time worrying about small differences. When I upgrade to a newer CUDA toolkit, I recompile CUDALucas and run new fft bench and thread bench. This usually does end up with somewhat different preferred fft sizes. I then run a few double checks, and once they are fine I trash the old version.

2015-10-12, 17:56   #10
chalsall
If I May

"Chris Halsall"
Sep 2002

9,323 Posts

Quote:
 Originally Posted by frmky I don't spend time worrying about small differences ...and once they are fine I trash the old version.
So, then, you throw important information away?

WTF?

 2015-10-12, 18:15 #11 frmky     Jul 2003 So Cal 22·33·19 Posts I throw information away, yes. I question whether it's important.

 Similar Threads Thread Thread Starter Forum Replies Last Post a1call Lounge 8 2016-12-03 21:20 patrik GPU Computing 3 2014-07-20 23:56 jasonp GMP-ECM 8 2012-02-12 22:25 Andi47 Msieve 12 2010-02-01 19:30 only_human Miscellaneous Math 9 2009-02-23 00:11

All times are UTC. The time now is 00:06.

Fri Nov 27 00:06:48 UTC 2020 up 77 days, 21:17, 4 users, load averages: 1.91, 1.79, 1.53