mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

flashjh 2015-02-05 02:40

[QUOTE=wombatman;394445]Yeah, I had used the batch file method too. Just wondered if there was a way to just have it run without crashing. Thanks![/QUOTE]

Take a look at [URL="http://www.mersenneforum.org/showthread.php?p=364436#post364436"]this[/URL]. It has been over a year since I did the testing and I don't know how CUDA 6.5 will do with everything. Either way, it really appears to be a driver issue.

Let me know what you find... :smile:

BTW - Yes, my binaries are for Windows.

wombatman 2015-02-05 04:46

Yeah, I'll try and run through paces. The little bit I've found so far (and you may already know all this) is that the error comes from the Timeout Detection and Recovery ([url]http://http.developer.nvidia.com/NsightVisualStudio/2.2/Documentation/UserGuide/HTML/Content/Timeout_Detection_Recovery.htm[/url]). So basically, if the GPU stops responding for 2 seconds (by default), Windows restarts the driver, which is the error we get.

You can increase the delay time either through the registry or using NSight Monitor (under Options/General). Increasing it to 20 seconds got me through the benchmark of -r 0 and up to 5000K on the cufftbench 1 8192 6. But I still hung and crashed. The error given is below:
[CODE]CUDALucas.cu(2366) : cudaSafeCall() Runtime API error 30: unknown error.
CUDALucas.cu(1049) : cudaSafeCall() Runtime API error 46: all CUDA-capable devices are busy or unavailable.[/CODE]

The first line, 2366, refers to: [CODE]cutilSafeCall (cudaEventRecord (stop, 0));
err = cutilSafeCall1(cudaEventSynchronize (stop));[/CODE]

The second, 1049, refers to: [CODE]cutilSafeCall (cudaMalloc ((void **) &g_x, sizeof (double) * n));//size_d));[/CODE] in alloc_gpu_mem function.

So maybe there is an issue with the synchronizing that causes a hang/error, and when you try to fix it by restarting the device, everything happens too quickly and you can't do the memory allocation? As you might imagine, I'm totally guessing here.


Edit: Also, it's worth mentioning that when I turned TDR off completely, I still got a hang from CUDALucas. So TDR is not directly responsible (I think) for the error. It's just what we see when the driver restarts. Also, the point at which CUDALucas hangs is inconsistent, even with the same command line parameters.

owftheevil 2015-02-05 12:47

Here's what I know about the bug.

It hangs during a cufft call.
It is specific to compute 2.0 cards.
It is most likely not a problem with cufft:
cuftt4.2 with Nvidia driver 295.?? works
cufft4.2 with >30?.?? drivers show the bug


In Linux, we can recover by resetting the device inside CUDALucas.


In Windows, the devices are deactivated after the timeout
so instead of continuing merrily on our way, we get that
memory allocation error you are seeing. CUDALucas needs
to be restarted to continue.
[COLOR=#000000][FONT=sans-serif]
[/FONT][/COLOR]

monsted 2015-02-05 13:23

[QUOTE=flashjh;394429]2.05 Beta CUDA 6.5 x64 is [URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/"]here[/URL] (passed self test)

I can also build the CUDA 7.0 binaries, but 7.0 is still Release Candidate, so if you experience bugs...

CUDA Libs are [URL="https://sourceforge.net/projects/cudalucas/files/CUDA%20Libs/"]here[/URL]. I'll upload the 7.0 libs when 7.0 is final.[/QUOTE]

Thanks! Reworking that doublecheck now (38 hours to do M38635771,71,1 on a GTX970).

wombatman 2015-02-05 14:17

[QUOTE=owftheevil;394530]Here's what I know about the bug.

It hangs during a cufft call.
It is specific to compute 2.0 cards.
It is most likely not a problem with cufft:
cuftt4.2 with Nvidia driver 295.?? works
cufft4.2 with >30?.?? drivers show the bug


In Linux, we can recover by resetting the device inside CUDALucas.


In Windows, the devices are deactivated after the timeout
so instead of continuing merrily on our way, we get that
memory allocation error you are seeing. CUDALucas needs
to be restarted to continue.
[COLOR=#000000][FONT=sans-serif]
[/FONT][/COLOR][/QUOTE]

How odd that it is specific to 2.0 cards (which is what I have). For what it's worth, I started a CUDALucas run overnight (M65911957 with FFT size of 3584K), and at least as far back as I can see, which is around 2-3 hours, it has not errored out. So maybe increasing the TdrDelay registry value helps?

flashjh 2015-02-05 14:44

[QUOTE=wombatman;394540]How odd that it is specific to 2.0 cards (which is what I have). For what it's worth, I started a CUDALucas run overnight (M65911957 with FFT size of 3584K), and at least as far back as I can see, which is around 2-3 hours, it has not errored out. So maybe increasing the TdrDelay registry value helps?[/QUOTE]

If you use this batch file it will count the restarts and put the number in the title of the window. You can also send it to a log, if you want.
[CODE]
@echo off
Set count=0
Set program=CUDALucas2.05Beta-CUDA5.0-Win32-r60
:loop
[LEFT]TITLE %program% Current Reset Count = %count%
[/LEFT]
Set /A count+=1
rem echo %count% >> log.txt
rem echo %count%
%program%.exe
GOTO loop
[/CODE]For what it's worth, I did a lot of testing and found that the restart problem, though irritating, didn't affect the results. So once you get it going, you should be ok. It was a hassle when trying to setup the cufftbench though.

flashjh 2015-02-05 15:07

[STRIKE][QUOTE=owftheevil;394530]
It is specific to compute 2.0 cards.[/QUOTE]

The 970 is [URL="http://en.wikipedia.org/wiki/CUDA#Supported_GPUs"]CC 5.2[/URL], should it be affected?

@wombatman, do you want a CUDA 6.5, CC 5.2 only build to see it it's any faster?[/STRIKE]
Never mind... got people mixed up

Us old 2.0 card holders need to get with the times :smile:

wombatman 2015-02-05 15:47

I only have a CC 2.0 card, so I wouldn't be able to run it, unfortunately.

flashjh 2015-02-05 16:08

[QUOTE=monsted;394533]Thanks! Reworking that doublecheck now (38 hours to do M38635771,71,1 on a GTX970).[/QUOTE]

@monsted, do you want a CUDA 6.5, CC 5.2 only build to see it it's any faster?

I put them on [URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/"]SourceForge[/URL] for you, if you want to try.

monsted 2015-02-06 10:59

[QUOTE=flashjh;394554]@monsted, do you want a CUDA 6.5, CC 5.2 only build to see it it's any faster?

I put them on [URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/"]SourceForge[/URL] for you, if you want to try.[/QUOTE]
Tried it out, but it doesn't seem to have made any noticable difference.ms/It is just about 3.6080 with both binaries.

It does cut down the size of the binary, so i'm guessing it just doesn't carry the cores it wouldn't use anyway?

flashjh 2015-02-06 11:02

[QUOTE=monsted;394685]Tried it out, but it doesn't seem to have made any noticable difference.ms/It is just about 3.6080 with both binaries.

It does cut down the size of the binary, so i'm guessing it just doesn't carry the cores it wouldn't use anyway?[/QUOTE]

Yes this binary was built only for 5.2. I tried the self test on a 580, it runs but gives all 0 residues.


All times are UTC. The time now is 23:06.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.