![]() |
|
|
#1959 |
|
"Kieren"
Jul 2011
In My Own Galaxy!
1015810 Posts |
For me, the performance hit for -t seems very small on a 580. Just guestimating ~5%, if that. Also, CPU use hovers around 1% or a bit more, even with Polite=70. Doesn't change much toggling 70-0-70. The 580 seems to react differently than the 570. I don't have a lot of hard figures as I mostly prefer factoring work.
|
|
|
|
|
|
#1960 | |
|
Romulan Interpreter
Jun 2011
Thailand
26·151 Posts |
Quote:
So, if you do DC and get a mismatch every 20 tests, and you rerun (that) one to get the right residue (caution! you can get a mismatch not because the card is bad nor because something goes wrong during testing - you also can get a mismatch because, simple, the residue in the DB is wrong; in any case, IF you rerun the test, to "be sure"), than you are even with using -t switch. You will get mismatches more often, you are better with -t. You get mismatches seldom, you waste time with -t (see my former post). |
|
|
|
|
|
|
#1961 |
|
"Svein Johansen"
May 2013
Norway
3·67 Posts |
I read this on cuda fft 5.5 today:
On the GK110, the kernel occasionally may produce incorrect results. This happens when, either by loop unrolling or straight lines of code, there are more than 63 outstanding texture/LDG instructions at one point during the program execution. "Outstanding" in this case means none of the results of these instructions have been used. The underlying cause is that the texture barrier can track at most 63 outstanding texture/LDG instructions. If there are more than 63 such instructions, the texture barrier can no longer be relied on to ensure that any instruction's result is correct. This issue can be worked around by adding -maxrregcount 63 to ptxas. This guarantees there are at most 63 outstanding texture instructions because each texture/LDG will write at least one register. However, this may downgrade performance because it limits the maximum number of registers. (This issue has been fixed for CUDA 6.0.) http://docs.nvidia.com/cuda/cuda-toolkit-release-notes/ |
|
|
|
|
|
#1962 | |
|
"Jerry"
Nov 2011
Vancouver, WA
1,123 Posts |
CUDALucas v2.05 Beta is posted to Sourceforge. The Windows executable is here. Linux is coming shortly.
Windows .exe is compiled for CUDA 5.5 x64 with sm_13,sm_20,sm_30 & sm_35. If you need the library files, they're here. If you want a different CUDA|sm version, let me know and I'll see if I can compile it. I'm still unable to compile a debug version. I'll keep working on it (anyone gotten a debug version out of the cmd window of MSVS using make?) This is an extensive update with code for: - bit shift (for DoubleCheck capability) - on-the-fly FFT increase/decrease - CheckRoundoffAllIterations is no longer an option, it does it every time now - and other updates listed here and on SourceForge and in this thread. We need testing please. Make a list of things that: 1) Don't work at all 2) Need work or fail after some testing 3) You would like to see added/modified This is a quote from owftheevil about the updated FFT code: Quote:
- Error recovery is a very serious issue. When P95 encounters a rounding error, it doesn't give up right away (usually). CUDALucas needs the same functionality as there is no reason to just quit after one rounding error. -- Possible solution: Multiple save files (like P95) and when an error occurs, it needs to try the older file(s) before quitting. Possibly just jump to the oldest file and start comparing residues with the ones done before failure and as long as they match, keep going? - Currently when running CUDALucas the program just stops. I'm still working to track down the cause, but I think it may be related to the rounding issue. - How does P95 know to turn in a result with or without error codes? CUDALucas could use this functionality. - Getting everyone to understand that CUDALucas is VERY different from mfaktc/o. OC'ing is a CUDALucas killer on cards with no ECC memory. - CUDALucas needs to be tested and tested and tested to optimize exponents with FFTs for different GPUs. All that code should be in CUDALucas or an external fft.txt file. I know some of this functionality is in CUDApm1 (and already in CUDALucas's code). It just needs work. - Early detection of overclocked/defective cards: CUDALucas needs a robust self test that runs on first-time cards, just like P95 runs a benchmark (on demand also). CUDALucas could be coded to recognize cards and their 'normal' clock and warn users that the OC is not a good idea. Moreover, I think that having a really good memory test in CUDALucas will allow users to test their cards properly. I know I read in the forum that someone developed a good GPU memory test, I just can't find the thread. Is that code open so we can add it to CUDALucas? EDIT: I found it here, written by owftheevil, and probably already in the code? EDIT2: done
Last fiddled with by flashjh on 2013-11-13 at 03:37 |
|
|
|
|
|
|
#1963 |
|
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
9,497 Posts |
CuLu doesn't sound very good in Spanish or Portugese. Or Italian.
Could we please limit the use of this silly abbreviation? It is called CUDALucas, as far as I know. |
|
|
|
|
|
#1964 |
|
"Kieren"
Jul 2011
In My Own Galaxy!
100111101011102 Posts |
Hi Jerry,
Compliments for a really informative, and link-filled roll out of this update. I'll give it a try on a 570 and a 580. I'm pretty confident running the Gigabyte 570 VRAM at 1600 MHz. It is rated for 1900, but is shaky running CL even at 1800. It might work at 1700, but I don't want to potentially waste a little over a day's work to find out. On the other hand, this particular 570 GPU is factory OC'd at 844 MHz, and has not given trouble once I found a safe range to run the VRAM. The memory on the Asus 580 has given good DC's at stock speed of 2004. GPUz reports default clock as 782 which I think is a 10 Mhz OC. It runs TF happily at 844 MHz, but I'd probably throttle back to 830, max, for CLucas. If I encounter problems, my first response will be to slow something down some more and try again. |
|
|
|
|
|
#1965 |
|
"Kieren"
Jul 2011
In My Own Galaxy!
100111101011102 Posts |
Here are a variety of (disorganized) results.
I will email the text output files to James, anon. Last fiddled with by kladner on 2013-11-13 at 06:39 |
|
|
|
|
|
#1966 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
754310 Posts |
Can we change the intermediate output (example below) so that it does not look very much like the final result lines?
Code:
Iteration 54710000 M( 62807803 )C, 0x91c985b44391452b, n = 3670016, CUDALucas v2.03 err = 0.0735 (5:04 real, 30.3250 ms/iter, ETA 68:08:49) Iteration 54720000 M( 62807803 )C, 0xece05e44bdde87f2, n = 3670016, CUDALucas v2.03 err = 0.0735 (5:03 real, 30.3262 ms/iter, ETA 68:03:55) |
|
|
|
|
|
#1967 |
|
"Jerry"
Nov 2011
Vancouver, WA
112310 Posts |
No problem. I can't believe you posted that because literally, I was just sitting and looking at the output line on the screen and working it so it will only take up one line. I'll get the code changed and uploaded.
Last fiddled with by flashjh on 2013-11-14 at 23:38 Reason: Can't spell |
|
|
|
|
|
#1968 |
|
"James Heinrich"
May 2004
ex-Northern Ontario
23×149 Posts |
|
|
|
|
|
|
#1969 |
|
"Antonio Key"
Sep 2011
UK
32·59 Posts |
CUDALucas seems to think my GT640 has a hole where it's memory should be, it says I have (minus)2GiB totalGlobalmem!
I find this strangely disquieting for software which is dealing with large numbers
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Don't DC/LL them with CudaLucas | LaurV | Data | 131 | 2017-05-02 18:41 |
| CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 | Brain | GPU Computing | 13 | 2016-02-19 15:53 |
| CUDALucas: which binary to use? | Karl M Johnson | GPU Computing | 15 | 2015-10-13 04:44 |
| settings for cudaLucas | fairsky | GPU Computing | 11 | 2013-11-03 02:08 |
| Trying to run CUDALucas on Windows 8 CP | Rodrigo | GPU Computing | 12 | 2012-03-07 23:20 |