mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

kladner 2013-10-06 14:38

[QUOTE=Karl M Johnson;355209]Good explanation, [B]LaurV[/B].
I always run with the -t flag, as both the memory and the GPU is overclocked.[/QUOTE]

For me, the performance hit for -t seems very small on a 580. Just guestimating ~5%, if that. Also, CPU use hovers around 1% or a bit more, even with Polite=70. Doesn't change much toggling 70-0-70. The 580 seems to react differently than the 570. I don't have a lot of hard figures as I mostly prefer factoring work.

LaurV 2013-10-07 02:26

[QUOTE=kladner;355414]For me, the performance hit for -t seems very small on a 580. Just guestimating ~5%, if that. Also, CPU use hovers around 1% or a bit more, even with Polite=70. Doesn't change much toggling 70-0-70. The 580 seems to react differently than the 570. I don't have a lot of hard figures as I mostly prefer factoring work.[/QUOTE]
That is exactly the behavior I see for all my 580's (different brands), with the only difference that I use polite 1-0-1 (when I need the cards, I switch to full polite, when not, to aggressive; I don't "game" but some CAD software needs the card sometime).
So, if you do DC and get a mismatch every 20 tests, and you rerun (that) one to get the right residue (caution! you can get a mismatch not because the card is bad nor because something goes wrong during testing - you also can get a mismatch because, simple, the residue in the DB is wrong; in any case, IF you rerun the test, to "be sure"), than you are even with using -t switch. You will get mismatches more often, you are better with -t. You get mismatches seldom, you waste time with -t (see my former post).

Manpowre 2013-10-19 18:48

I read this on cuda fft 5.5 today:
On the GK110, the kernel occasionally may produce incorrect results. This happens when, either by loop unrolling or straight lines of code, there are more than 63 outstanding texture/LDG instructions at one point during the program execution. "Outstanding" in this case means none of the results of these instructions have been used. The underlying cause is that the texture barrier can track at most 63 outstanding texture/LDG instructions. If there are more than 63 such instructions, the texture barrier can no longer be relied on to ensure that any instruction's result is correct.
This issue can be worked around by adding -maxrregcount 63 to ptxas. This guarantees there are at most 63 outstanding texture instructions because each texture/LDG will write at least one register. However, this may downgrade performance because it limits the maximum number of registers. (This issue has been fixed for CUDA 6.0.)

[url]http://docs.nvidia.com/cuda/cuda-toolkit-release-notes/[/url]

flashjh 2013-11-13 03:14

CUDALucas 2.05 beta and "CUDALucas Road Map"
 
[FONT=System]CUDALucas v2.05 Beta is posted to Sourceforge. The Windows executable is [/FONT][URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/"][FONT=System][COLOR=#0000ff]here[/COLOR][/FONT][/URL][FONT=System]. Linux is coming shortly.[/FONT]

[FONT=System]Windows .exe is compiled for CUDA 5.5 x64 with sm_13,sm_20,sm_30 & sm_35. If you need the library files, they're [/FONT][URL="https://sourceforge.net/projects/cudalucas/files/CUDA%20Libs/"][FONT=System][COLOR=#0000ff]here[/COLOR][/FONT][/URL][FONT=System]. If you want a different CUDA|sm version, let me know and I'll see if I can compile it. I'm still unable to compile a debug version. I'll keep working on it (anyone gotten a debug version out of the cmd window of MSVS using make?)[/FONT]

[FONT=System]This is an [U]extensive[/U] update with code for:[/FONT]
[FONT=System]- bit shift (for DoubleCheck capability)[/FONT]
[FONT=System]- on-the-fly FFT increase/decrease[/FONT]
[FONT=System]- CheckRoundoffAllIterations is no longer an option, it does it every time now[/FONT]
[FONT=System]- and other updates listed [/FONT][URL="http://www.mersenneforum.org/showthread.php?t=18791&page=2"][FONT=System][COLOR=#0000ff]here[/COLOR][/FONT][/URL][FONT=System] and on SourceForge and in this thread.[/FONT]

[FONT=System][U]We need testing please[/U]. Make a list of things that:[/FONT]

[FONT=System]1) Don't work at all[/FONT]
[FONT=System]2) Need work or fail after some testing[/FONT]
[FONT=System]3) You would like to see added/modified[/FONT]

[FONT=System]This is a quote from owftheevil about the updated FFT code:[QUOTE]Sorry, this is a bit too much for a diff file. Read the change log at the beginning of the file for a summary of what has been done. I have given the fft change routines a thorough checking, starting and later resuming a test with a much too small fft (273xxxxx exponent with 1200k fft). It quickly increases the fft length to the appropriate size and gives good residues. I ran a test of a 274xxxxx exponent with 1440k fft and the error limit set to 0.38. It increased and then decreased the fft 9 times during the test and ended with a matching residue. I also used the F and f options liberally with no ill effects on the residues.[/QUOTE]Some things I have noted that need work:[/FONT]

[FONT=System]- Error recovery is a very serious issue. When P95 encounters a rounding error, it doesn't give up right away (usually). CUDALucas needs the same functionality as there is no reason to just quit after one rounding error.[/FONT]
[FONT=System]-- Possible solution: Multiple save files (like P95) and when an error occurs, it needs to try the older file(s) before quitting. Possibly just jump to the oldest file and start comparing residues with the ones done before failure and as long as they match, keep going?[/FONT]

[FONT=System]- Currently when running CUDALucas the program just stops. I'm still working to track down the cause, but I think it may be related to the rounding issue.[/FONT]

[FONT=System]- How does P95 know to turn in a result with or without error codes? CUDALucas could use this functionality.[/FONT]

[FONT=System]- Getting everyone to understand that CUDALucas is VERY different from mfaktc/o. [U]OC'ing is a CUDALucas killer on cards with no ECC memory.[/U][/FONT]

[FONT=System]- CUDALucas needs to be tested and tested and tested to optimize exponents with FFTs for different GPUs. All that code should be in CUDALucas or an external fft.txt file. I know some of this functionality is in [/FONT][URL="http://www.mersenneforum.org/showthread.php?p=359102#post359102"][FONT=System][COLOR=#0000ff]CUDApm1[/COLOR][/FONT][/URL][FONT=System] (and already in CUDALucas's code). It just needs work.[/FONT]

[FONT=System]- Early detection of overclocked/defective cards: CUDALucas needs a robust self test that runs on first-time cards, just like P95 runs a benchmark (on demand also). CUDALucas could be coded to recognize cards and their 'normal' clock and warn users that the OC is not a good idea. Moreover, I think that having a really good memory test in CUDALucas will allow users to test their cards properly. I know I read in the forum that someone developed a good GPU memory test, I just can't find the thread. Is that code open so we can add it to CUDALucas? EDIT: I found it [/FONT][URL="http://www.mersenneforum.org/showthread.php?p=339265#post339265"][FONT=System][COLOR=#0000ff]here[/COLOR][/FONT][/URL][FONT=System], written by owftheevil, and probably already in the code?[/FONT]

[FONT=System]EDIT2: done :smile:[/FONT]

Batalov 2013-11-13 03:25

CuLu doesn't sound very good in Spanish or Portugese. Or Italian.
Could we please limit the use of this silly abbreviation?

It is called CUDALucas, as far as I know.

kladner 2013-11-13 04:36

Hi Jerry,
Compliments for a really informative, and link-filled roll out of this update. I'll give it a try on a 570 and a 580.

I'm pretty confident running the Gigabyte 570 VRAM at 1600 MHz. It is rated for 1900, but is shaky running CL even at 1800. It might work at 1700, but I don't want to potentially waste a little over a day's work to find out. On the other hand, this particular 570 GPU is factory OC'd at 844 MHz, and has not given trouble once I found a safe range to run the VRAM.

The memory on the Asus 580 has given good DC's at stock speed of 2004. GPUz reports default clock as 782 which I think is a 10 Mhz OC. It runs TF happily at 844 MHz, but I'd probably throttle back to 830, max, for CLucas.

If I encounter problems, my first response will be to slow something down some more and try again.

kladner 2013-11-13 06:37

1 Attachment(s)
Here are a variety of (disorganized) results.

I will email the text output files to James, anon.

Prime95 2013-11-14 23:30

Can we change the intermediate output (example below) so that it does not look very much like the final result lines?

[CODE]Iteration 54710000 M( 62807803 )C, 0x91c985b44391452b, n = 3670016, CUDALucas v2.03 err = 0.0735 (5:04 real, 30.3250 ms/iter, ETA 68:08:49)
Iteration 54720000 M( 62807803 )C, 0xece05e44bdde87f2, n = 3670016, CUDALucas v2.03 err = 0.0735 (5:03 real, 30.3262 ms/iter, ETA 68:03:55)
[/CODE]

When "Lan_Party" submitted these lines, the manual web page gave CPU credit for each intermediate result. James, can the PHP script be modified to distinguish between the intermediate and final result lines?

flashjh 2013-11-14 23:38

No problem. I can't believe you posted that because literally, I was just sitting and looking at the output line on the screen and working it so it will only take up one line. I'll get the code changed and uploaded.

James Heinrich 2013-11-15 00:14

[QUOTE=Prime95;359314]James, can the PHP script be modified to distinguish between the intermediate and final result lines?[/QUOTE]Shouldn't be a problem, I'll look at it when I get back on Sunday.

Antonio 2013-11-15 01:32

CUDALucas seems to think my GT640 has a hole where it's memory should be, it says I have [COLOR=#ff0000](minus)[/COLOR]2GiB totalGlobalmem!
I find this strangely disquieting for software which is dealing with large numbers :smile:


All times are UTC. The time now is 23:11.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.