mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

LaurV 2014-03-21 16:44

[QUOTE=owftheevil;369553]How about backing up the old fft.txt file? (I mean instead of overwriting it) Other routines depend on fft.txt being in increasing order.[/QUOTE]
Perfect for me. Rename it like "(chip)fft_0.txt", "..._1.txt", on the same idea like [URL="http://www.mersenneforum.org/showpost.php?p=369388&postcount=27"]here[/URL] (second code box). 3 copies are enough, if the guy does dot realize after 2 times that the file is [STRIKE]overwritten[/STRIKE] renamed, that he is either stupid or he does not care. Then I/he can manually interleave and sort if I/he want(s).

(edit: the only idea is to not lose a LONG fft file with all reasonable sizes inside, [U]without notification[/U] (like it is happening). Maybe I worked one full day to get that file and I don't have backup! I would be very angry than! :smile: - luckily I had more folders with the same content, having more of the same cards, and I had copies of the file in those folders, it may not always be the case)

(edit 2: optimization of threads works very nice, and faster than the older version. The only unchanged thing is that the work is saved at the end, which may result in trouble if there is a crash, but here is no problem, this optimization is only done once in the lifetime, and it can be split in few consecutive small jobs, I mean I don't need to use "-threadbench 1 20480 6 0", but use 3-4 "splits". Which I was enough stupid not to think about, and the job took since the last post. Fortunately finished with success :smile:)

pdazzl 2014-04-01 20:29

Happening to me with GTX 570
 
Thanks for the restart batch file.

I am getting the API runtime errors, even with the latest beta build r65 (running toolkit 5.0 and latest 335.23 nvidia drivers )....however this is only happening on my gtx 570, not my 280. I have noticed that the 570 will run stable until I stop the job and go to mfaktc and then switch back to the LL job. It'll continue happening until I reboot my box. So far that seems to be what triggers the API errors for me. I have never seen this behavior on my 280 even when switching between cuda lucas and mfaktc.



[QUOTE=flashjh;364436]r60 compiled and tested (still needs more). CUDA 4.2 up to 5.5 all working, release and debug. All posted to [URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/"]SourceForge[/URL]

This version (and r57 and up) include new rcb code from Prime95 that give about a 1% speed improvement! Exciting for CUDALucas, but does need testing, please.

In my testing CUDA 5.5 and Win32 are slightly faster than earlier versions or x64 (but you may need a batch file to keep it going, see below)

What works:
-cufftbench
-r
-normal testing

What Doesn't:
-threadbench

Didn't test:
-memtest

[U][B]For those experiencing stops: This is an nVidia driver issue. Here is some info and I included some workarounds[/B][/U]

<=306.97 work with x86/x64 CUDA 4.2 and CUDA 5.0 builds perfectly fine and produces no restarts (at least none from my testing over several days).

>=310.70 have resets no matter what platform/CUDA version including 5.5 with >=320.18.

There are two workarounds for anyone experiencing a similar problem described by [URL="http://www.mersenneforum.org/showthread.php?p=362968#post362968"]mognuts[/URL]:

1) The best way to fix the error is to downgrade your driver to one of the versions <=306.97 as mentioned above.

CUDA Driver Versions:

[CODE]CUDA 5.5: CUDA 5.0 CUDA 4.2
331.82 19-Nov-13 314.22 25-Mar-13 301.42 22-May-12
331.65 07-Nov-13 314.07 18-Feb-13 296.10 13-Mar-12
331.58 21-Oct-13 310.90 05-Jan-13 295.73 21-Feb-12
327.23 19-Sep-13 310.70 17-Dec-12 285.62 24-Oct-11
320.49 01-Jul-13 [B]306.97 10-Oct-12[/B] 280.26 09-Aug-11
320.18 23-May-13 306.23 13-Sep-12 275.33 01-Jun-11
[/CODE] I did not actually test below 296.10 so I don't know where the CUDA changes over to < CUDA 4.2 but I figure most will be on 296.10 by now.

Windows CUDALucas from CUDA 4.0 up to 5.5, 32 or 64 bit are on SourceForge

Request: I need to know who else is having the *stop* issue and what driver and video card you have. I'm working with NVidia to try and get the drivers fixed, so it will be helpful to know what other cards have this issue.

2) The other 'fix' for this issue is to use a batch file similar to this:
[CODE]@echo off
Set count=0
Set program=CUDALucas2.05Beta-CUDA5.0-Win32-r60
:loop
TITLE %program% Current Reset Count = %count%
Set /A count+=1
rem echo %count% >> log.txt
rem echo %count%
%program%.exe
GOTO loop[/CODE] This will restart CUDALucas each time it stops and allow you see how many resets have occurred, if you care.

I have not been able to thoroughly test speeds yet; I know that CUDA 5.5 is usually faster, but at the cost of having the driver lockup. Combined with the batch file, there really is no issue other than if the restarts bother you as I've run many good DCs with the batch file.

With <=306.97, you don't need the batch file and there are no restarts, but it could potentially be &slightly* slower. I would love to see actual test data from everyone. Also, if anyone does experience the *stop* while on <=306.97, please let me know ASAP so I can update this info and nVidia.

As for reliability, I have completed many successful tests with 2.05 Beta, CUDA 4.0 up to 5.5, 32 and 64 bit. Many with a lot of stop and restarts and forced FFT size changes for testing the code.

:smile:[/QUOTE]

MikeBerlin 2014-04-03 20:16

CudaLucas doesn´t work anymore
 
Round off error at iteration = 21463800, err = 0.5 > 0.40, fft = 3584K.
Increasing fft and restarting from last checkpoint.

Using threads: square 128, splice 256.

Continuing M62494429 @ iteration 21460001 with fft length 4096K, 34.34% done

After some errors more, the programm stops.
If I restart it tells at the end (example):
Processing result: M( x )C, 0xy, offset = 6684, n = 4096K, CUDALucas v2.05 Beta, g_AID: A6ACCD2C719C7543871E42683998589C.
I think, the result ist bad.

owftheevil 2014-04-03 20:51

Can you recall what the "more errors" were?

The root problem is most likely memory, at least that's the only time I see a roundoff error like that. But I don't know whats going on with the apparent output of a result after the errors.

MikeBerlin 2014-04-09 15:59

[QUOTE=owftheevil;370257]Can you recall what the "more errors" were?[/QUOTE]No, I cant The Logfile doesn´t exist anymore. I tried the next with the "savefile"-Option. Only if I reduce the memory-speed (-500 MHz) and the "Power Limit" (57%) -> GPU = 692 MHz, I get less errors. But what does "g_AID" mean? (last result: M( 62494429 )C, 0x7191357b114a13__, offset = 31262106, n = 4096K, CUDALucas v2.05 Beta, g_AID: EEFBC9895C77C54B1AC676621FFA____)
How can I see, how my "Computer must be proven reliable"?
Ohhhh, I forget to log in in GIMPS and lost my result. 157 GHz-days!

Now I found in my assginments the same exponent as double check!

Batalov 2014-04-09 17:04

[QUOTE=MikeBerlin;370653]No, I cant The Logfile doesn´t exist anymore. I tried the next with the "savefile"-Option. Only if I reduce the memory-speed (-500 MHz) and the "Power Limit" (57%) -> GPU = 692 MHz, I get less errors. But what does "g_AID" mean? (last result: M( 62494429 )C, 0x7191357b114a13__, offset = 31262106, n = 4096K, CUDALucas v2.05 Beta, g_AID: EEFBC9895C77C54B1AC676621FFA____)
How can I see, how my "Computer must be proven reliable"?
Ohhhh, I forget to log in in GIMPS and lost my result. 157 GHz-days![/QUOTE]
PM user Prime95 and you will be helped.
[QUOTE]
Now I found in my assginments the same exponent as double check![/QUOTE]
That's not useful to you. The second GPU result will be given no credit.

MikeBerlin 2014-04-09 18:43

[QUOTE=Batalov;370656]PM user Prime95 and you will be helped.[/QUOTE]
Thank very much for this hint. Maybe, he will solve my old Problem with M332,224,379
[QUOTE] That's not useful to you. The second GPU result will be given no credit.[/QUOTE]yes and this is gone allone.

MikeBerlin 2014-04-10 15:42

[QUOTE=Batalov;370656]PM user Prime95 and you will be helped.[/QUOTE]YES, he did it!:smile:Many thanks again.:smile:

diep 2014-04-25 10:13

hi! Sorry to jump in on this thread.

How efficient is the code running?

[url]https://developer.nvidia.com/cuFFT[/url]

I see there at bit larger transforms cuFFT gets at M2090 tesla efficiency of under 100 Gflop. Didn't checkout code yet - will soon. This Tesla delivers 666 Gflop. Not counting fused-multiply-adds (didn't check yet whether their code uses them - assuming not) then it's 333 Gflop. So efficiency of around 30%.

How is efficiency there for CUDALucas at bit larger transforms?

Interested in gpgpu fft for Riesel :)

flashjh 2014-04-28 04:24

CUDALucas 2.05Beta r67 is [URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/"]posted[/URL] for Windows. CUDA 4.2, 5.0, 5.5 and 6.0

CUDA 6.0 Libs are [URL="https://sourceforge.net/projects/cudalucas/files/CUDA%20Libs/"]here[/URL]

[QUOTE]r67 just uploaded includes a facility for backing up fft.txt files with a timestamp. I've included a README and CUDALucas.ini with a few updates. The README has a rough draft of a new section on command line options and tuning. There are still many additionsand other changes to be made...[/QUOTE]

firejuggler 2014-04-28 06:31

1 Attachment(s)
[code]
CUDALucas_205Beta_CUDA6.0-x64_r67.exe -cufftbench 1 4096 1

------- DEVICE 0 -------
name GeForce GTX 750 Ti
Compatibility 5.0
clockRate (MHz) 1110
memClockRate (MHz) 2700
totalGlobalMem 2147483648
totalConstMem 65536
l2CacheSize 2097152
sharedMemPerBlock 49152
regsPerBlock 65536
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsPerMP 2048
multiProcessorCount 5
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 2147483647,65535,65535
textureAlignment 512
deviceOverlap 1

Using threads: square 256, splice 128.
[/code]


All times are UTC. The time now is 23:08.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.