![]() |
![]() |
#1 |
Mar 2010
3·137 Posts |
![]()
Good morning.
I have decided to do a small research regarding binary selection for CUDALucas. Downloaded the available binaries from Sourceforge, used the nearly default configuration file. I've used M43112609 for LL product calculation time, 2352K FFT size and 256 threads & splice, all running on 'ol GTX Titan. The average numbers of five different values were calculated: Code:
C_4.2: 1.8538 C_5.0: 1.82502 C_5.5: 1.8279 C_6.0: 1.83506 C_6.5: 1.6982 Now, same series of tests on M58496057 with 4320K FFT size: Code:
C_4.2: 3.52024 C_5.0: 3.40014 C_5.5: 3.32784 C_6.0: 3.31762 C_6.5: 3.33362 Miscellaneous observation(s): 1. For some reason, while C4.2 - C5.5 binaries are OK with a smaller starting FFT length, C6.0 and C6.5 binaries require bigger FFT sizes, and this is erratic behaviour always occurs. Why this is happening is beyond my knowledge of the topic. There is more to it than that: Code:
Using threads: square 256, splice 256. Starting M58496057 fft length = 4320K Running careful round off test for 1000 iterations. If average error > 0.25, or maximum error > 0.35, the test will restart with a longer FFT. Iteration = 80 < 1000 && err = 0.50000 > 0.35, increasing n from 4320K The fft length 4608K is too large for exponent 58496057, decreasing to 4320K Using threads: square 256, splice 256. Starting M58496057 fft length = 4320K Running careful round off test for 1000 iterations. If average error > 0.25, or maximum error > 0.35, the test will restart with a longer FFT. Iteration 100, average error = 0.00021, max error = 0.00032 Iteration 200, average error = 0.00024, max error = 0.00033 Iteration 300, average error = 0.00025, max error = 0.00033 Iteration 400, average error = 0.00026, max error = 0.00032 Iteration 500, average error = 0.00026, max error = 0.00034 Iteration 600, average error = 0.00026, max error = 0.00032 Iteration 700, average error = 0.00027, max error = 0.00032 Iteration 800, average error = 0.00027, max error = 0.00032 Iteration 900, average error = 0.00027, max error = 0.00032 Iteration 1000, average error = 0.00027 <= 0.25 (max error = 0.00034), continuing test. Some initialisation bug? 2. The situation may (and I have a feeling it will) be different for NV cards of other shader model. Tracking particular "golden" binaries for particular exponent isn't easy, and adding particular shader models into that makes it tougher. One day the developers of CUDALucas may have to consider maintaining only a single build of CUDALucas and deprecating the rest, thus "embracing progress". Comments, along with other CUDA builds ('specially 7.0 and 7.5), are welcome! |
![]() |
![]() |
![]() |
#2 |
Romulan Interpreter
"name field"
Jun 2011
Thailand
22×7×367 Posts |
![]()
That is nothing wrong with the program, only that your FFT is too big, for this expo I think a ~3M FFT would work better and faster than the 4M3 FFT you use. When I reach home I will check with my cudaLucas setup.
[edit: the 3M is from estimation looking to your error size. You may need a bit higher than 3M. Generally, the best way (i.e. optimum and fast and safe to test) is when the error is around 0.2 (like from 0.15 to 0.25 according with your FFT selection). This FFT is definitely too big] Last fiddled with by LaurV on 2015-10-12 at 06:11 |
![]() |
![]() |
![]() |
#3 |
Mar 2010
3×137 Posts |
![]()
Indeed, I've done more tests and found out that this "bug" or whatever's happening here can't always be reproduced.
Once the proper binary for any FFT size is picked, proper internal benchmarks should be done to find out the best FFT size/thread/splice combinations. Last fiddled with by Karl M Johnson on 2015-10-12 at 06:49 Reason: yes |
![]() |
![]() |
![]() |
#4 |
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
178F16 Posts |
![]()
Sounds like an unstable card to me.
|
![]() |
![]() |
![]() |
#5 |
Mar 2010
6338 Posts |
![]()
Not unlikely, even though I don't recall ever submitting a bad LL test before migrating from 337.xx Forceware.
As usual, needs more scientific testing. Any comments on the method? Any hints regarding CUDALucas and how it works with newer CUDA toolkit versions? Last fiddled with by Karl M Johnson on 2015-10-12 at 11:20 Reason: yes |
![]() |
![]() |
![]() |
#6 | |
Serpentine Vermin Jar
Jul 2014
5·677 Posts |
![]() Quote:
The basic code itself won't change with different FFT sizes, so it's only the way the program allocates memory which I think would be the big variable here, and 6.0 / 6.5 must not be entirely the same in that regard. I've never used any of the cuda compilers so I couldn't be more specific, but I'd look to see if any default options have changed, especially as it relates to the memory aspect. Like maybe 6.5 had some build option to do something to have it spend a little more time in garbage collection or something weird... something that would affect a larger memory chunk more than a smaller one. Or maybe something in the way it allocates memory differently, etc. etc. Last fiddled with by Madpoo on 2015-10-12 at 15:07 |
|
![]() |
![]() |
![]() |
#7 |
Mar 2010
3·137 Posts |
![]()
Okay, I thought the hardware is unstable, it may lose overclocking potential, but as it turns out, it's not entirely related to that.
So far I've gotten no bad residues on C4.2 binaries, but this doesn't mean anything yet. (5.5)C6.0-C6.5 could indeed be more stability-demanding, even if previous versions worked flawlessly for years. Will report my further findings. |
![]() |
![]() |
![]() |
#8 |
If I May
"Chris Halsall"
Sep 2002
Barbados
101011010000102 Posts |
![]() |
![]() |
![]() |
![]() |
#9 | |
Jul 2003
So Cal
259710 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#10 |
If I May
"Chris Halsall"
Sep 2002
Barbados
2×72×113 Posts |
![]() |
![]() |
![]() |
![]() |
#11 |
Jul 2003
So Cal
72·53 Posts |
![]()
I throw information away, yes. I question whether it's important.
![]() |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Binary Multitasking | a1call | Lounge | 8 | 2016-12-03 21:20 |
CUDALucas writing binary data to screen | patrik | GPU Computing | 3 | 2014-07-20 23:56 |
Would you use a 'fat binary' of GMP-ECM? | jasonp | GMP-ECM | 8 | 2012-02-12 22:25 |
How to build a binary of SVN183? | Andi47 | Msieve | 12 | 2010-02-01 19:30 |
2-d binary representation | only_human | Miscellaneous Math | 9 | 2009-02-23 00:11 |