mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

mognuts 2013-12-26 18:37

[QUOTE=flashjh;362676]Windows r56 executables posted to [URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/"]SourceForge[/URL][/QUOTE]

r56 just successfully completed double check of [URL="http://www.mersenne.org/report_exponent/?exp_lo=31010747&exp_hi=31010747&B1=Get+status"]31010747[/URL].

mognuts 2013-12-26 22:40

[QUOTE=mognuts;362958]r56 just successfully completed double check of [URL="http://www.mersenne.org/report_exponent/?exp_lo=31010747&exp_hi=31010747&B1=Get+status"]31010747[/URL].[/QUOTE]
Through the run I had a couple of these, but it didn't affect the result, or cause the drivers to stop working.
[CODE]| Dec 26 22:33:05 | M 54297883 2820000 0xa5d98db4daef2036 | 3136K 0.06641 5.3687 53.68s | 3:04:59:19 5.19% |
| Dec 26 22:33:59 | M 54297883 2830000 0x4a7f94d8efb62886 | 3136K 0.06641 5.3695 53.69s | 3:04:58:22 5.21% |
| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done |
| Dec 26 22:34:53 | M 54297883 2840000 0xa2c3927d7aefb869 | 3136K 0.06641 5.3689 53.68s | 3:04:57:25 5.23% |
| Dec 26 22:35:47 | M 54297883 2850000 0xf5b7f62de86145e4 | 3136K 0.06641 5.3705 53.70s | 3:04:56:28 5.24% |
C:/CUDA/CuLu/src/CUDALucas.cu(1509) : cudaSafeCall() Runtime API error 6: the launch timed out and was terminated.
Resetting device and restarting from last checkpoint.
Using threads: norm1 256, mult 256, norm2 1024.
C:/CUDA/CuLu/src/CUDALucas.cu(891) : cudaSafeCall() Runtime API error 46: all CUDA-capable devices are busy or unavailable.[/CODE]

frmky 2014-01-09 01:50

In 64-bit Linux, r56 segfaults when I run with -r, but has correctly completed a number of double-checks. r59 simply exits without starting a test from worktodo.txt.

owftheevil 2014-01-09 14:35

Does r59 work ok with -r?

Edit: For the exiting without starting a test in r59, on line 3346 in CUDALucas.cu, take the negation off of get_next_assignment. That should do it, although I can't check it myself right now.

frmky 2014-01-09 23:41

No, r59 crashes as well.

[CODE]Program received signal SIGSEGV, Segmentation fault.
0x000000000040420f in init_lucas (x_packed=0x6ddba0, q=86243,
n=0x7fffffffe044, j=0x7fffffffe040, offset=0x7fffffffe03c, total_time=0x0,
time_adj=0x0, iter_adj=0x0) at CUDALucas.cu:1317
1317 *time_adj = *total_time;
(gdb) bt
#0 0x000000000040420f in init_lucas (x_packed=0x6ddba0, q=86243,
n=0x7fffffffe044, j=0x7fffffffe040, offset=0x7fffffffe03c, total_time=0x0,
time_adj=0x0, iter_adj=0x0) at CUDALucas.cu:1317
#1 0x000000000040bb65 in check_residue (ls=0) at CUDALucas.cu:2624
#2 0x000000000040df57 in main (argc=2, argv=0x7fffffffe1f8)
at CUDALucas.cu:3334
(gdb) print total_time
$1 = (unsigned long long *) 0x0
(gdb) print time_adj
$2 = (unsigned long long *) 0x0
[/CODE]
So time_adj is a null pointer.

owftheevil 2014-01-10 14:51

Thanks frmky, your information was very useful. r60 should fix these bugs.

flashjh 2014-01-12 05:02

Getting close to release!
 
r60 compiled and tested (still needs more). CUDA 4.2 up to 5.5 all working, release and debug. All posted to [URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/"]SourceForge[/URL]

This version (and r57 and up) include new rcb code from Prime95 that give about a 1% speed improvement! Exciting for CUDALucas, but does need testing, please.

In my testing CUDA 5.5 and Win32 are slightly faster than earlier versions or x64 (but you may need a batch file to keep it going, see below)

What works:
-cufftbench
-r
-normal testing

What Doesn't:
-threadbench

Didn't test:
-memtest

[U][B]For those experiencing stops: This is an nVidia driver issue. Here is some info and I included some workarounds[/B][/U]

<=306.97 work with x86/x64 CUDA 4.2 and CUDA 5.0 builds perfectly fine and produces no restarts (at least none from my testing over several days).

>=310.70 have resets no matter what platform/CUDA version including 5.5 with >=320.18.

There are two workarounds for anyone experiencing a similar problem described by [URL="http://www.mersenneforum.org/showthread.php?p=362968#post362968"]mognuts[/URL]:

1) The best way to fix the error is to downgrade your driver to one of the versions <=306.97 as mentioned above.

CUDA Driver Versions:

[CODE]CUDA 5.5: CUDA 5.0 CUDA 4.2
331.82 19-Nov-13 314.22 25-Mar-13 301.42 22-May-12
331.65 07-Nov-13 314.07 18-Feb-13 296.10 13-Mar-12
331.58 21-Oct-13 310.90 05-Jan-13 295.73 21-Feb-12
327.23 19-Sep-13 310.70 17-Dec-12 285.62 24-Oct-11
320.49 01-Jul-13 [B]306.97 10-Oct-12[/B] 280.26 09-Aug-11
320.18 23-May-13 306.23 13-Sep-12 275.33 01-Jun-11
[/CODE] I did not actually test below 296.10 so I don't know where the CUDA changes over to < CUDA 4.2 but I figure most will be on 296.10 by now.

Windows CUDALucas from CUDA 4.0 up to 5.5, 32 or 64 bit are on SourceForge

Request: I need to know who else is having the *stop* issue and what driver and video card you have. I'm working with NVidia to try and get the drivers fixed, so it will be helpful to know what other cards have this issue.

2) The other 'fix' for this issue is to use a batch file similar to this:
[CODE]@echo off
Set count=0
Set program=CUDALucas2.05Beta-CUDA5.0-Win32-r60
:loop
TITLE %program% Current Reset Count = %count%
Set /A count+=1
rem echo %count% >> log.txt
rem echo %count%
%program%.exe
GOTO loop[/CODE] This will restart CUDALucas each time it stops and allow you see how many resets have occurred, if you care.

I have not been able to thoroughly test speeds yet; I know that CUDA 5.5 is usually faster, but at the cost of having the driver lockup. Combined with the batch file, there really is no issue other than if the restarts bother you as I've run many good DCs with the batch file.

With <=306.97, you don't need the batch file and there are no restarts, but it could potentially be &slightly* slower. I would love to see actual test data from everyone. Also, if anyone does experience the *stop* while on <=306.97, please let me know ASAP so I can update this info and nVidia.

As for reliability, I have completed many successful tests with 2.05 Beta, CUDA 4.0 up to 5.5, 32 and 64 bit. Many with a lot of stop and restarts and forced FFT size changes for testing the code.

:smile:

flashjh 2014-01-14 22:11

[URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/"]r62 posted[/URL] to fix the -threadbench problem

Usage for testing:

[B]CUDALucas -cufftbench lb ub p (e.g. CUDALucas -cufftbench 1 8192 6)[/B]

It gives a warning if either lb or ub is not a power of two. It works when they are not, but non optimal lengths near the edges of the range are likely to be included in <gpu> fft.txt.

[B]CUDALucas -threadbench lb ub p m (e.g. CUDALucas -threadbench 1 8192 6 1)
[/B]
The new parameter m (usually 0 or 1) controls a little bit of the behavior of the test. m = 0 causes all reasonable fft lengths ( n a multiple of 1K, largest prime factor of n is 7) between lb * 1k and ub * 1k to be tested, m = 1 tests only the lengths in <gpu> fft.txt and the table in init_ffts.

When testing the new versions run:

CUDALucas -r
CUDALucas -cufftbench 1 8192 6
CUDALucas -threadbench 1 8192 6 1

You can also run a [URL="http://www.mersenneforum.org/showthread.php?p=359754#post359754"]memtest[/URL]:

CUDALucas -memtest k n
where k * 25 MB of memory are tested, n * 10000 iterations are done for each of 5 data types at each of the k positions

blip 2014-02-17 21:48

I got the following output:

[CODE]
-- polite interval increased to 2
-- error_reset increased to 95
[/CODE]

What does that mean?

flashjh 2014-02-17 22:39

Can you post your Cudalucas.ini and the command line you're using to run the program?

blip 2014-02-17 23:39

1 Attachment(s)
I am running it as

[CODE]
CUDALucas -d 0
[/CODE]


All times are UTC. The time now is 23:08.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.