mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

kladner 2015-01-18 05:17

[QUOTE=wombatman;392742]Core and RAM are at stock values (797 Mhz and 975 Mhz, respectively, according to GPU-Z). It's an EVGA 570 GTX base model (so no factory overclocking or anything like that).

I'm not sure what -st2 is, but I usually (though not always) am unable to complete a full -threadbench or -cufftbench. Full being from 1 to 8192 with 2-5 repetitions.

I'll try downclocking the memory by 50 and see if that helps.[/QUOTE]

This class of video card is built for gaming, where a few graphical errors are negligible. When I experimented first with CuLu, it was probably on the 460 I'm running now. The RAM defaults to 1800 (900) MHz (remembering DDR memory). I am fairly sure that I only got the Self Test, and Extended Self Test, (-st, -st2) to complete successfully at 1700 (850). 50 MHz reduction in base clock sounds like a good start for testing.

(It's been a while since I ran CuLu. I'm sure I'll here about it from the local deities if I misstated anything above.) :ermm:

wombatman 2015-01-18 05:28

I'll try the self-tests as well (pretty sure it's -r 0 and -r 1). I also tried increasing the TdrDelay in the registry from 0 to 8. That just made the screen freeze for ~8 seconds when the driver stopped.

kladner 2015-01-18 05:47

Sorry for the vague or erroneous parts. The point is to get the lesser and greater self-tests to complete. At the time that I was experimenting with CUDALucas, one of the heavy hitters around here said he was running his cards at 1600, which might have meant from 200-400 MHz below the stock RAM clock, depending on the particular card. It seemed at the time that CuLu might be a definitive test of VRAM precision in a mathematical setting.

LaurV has experience with Tesla cards. They tend to run both GPU and RAM slower than consumer cards, besides having ECC memory. They are also scary expensive.

wombatman 2015-01-18 06:20

It's funny. It passed the short self-test (-r 0), and has had only one hiccup on the longer self-test (-r 1) so far. The one hiccup (a mismatched residue) passed fine when it repeated the test immediately after. The long test hasn't finished yet, but we'll see what it does.

The crazy thing is that this is all at stock clock.

Edit: Update--got through the self-test with only the one hiccup. Tried to rerun threadbench. Failed at FFT of 3780K. Dropped memory speed by 100 Mhz and ran threadbench again. Failed again at about the same FFT. No idea what's going on.

monsted 2015-01-31 14:06

CUDA 6.5 compile
 
Hiya.

I've compiled a CUDALucas binary for my GTX970. If anyone wants to try it out, i've uploaded it to [url]http://monsted.dk/CUDALucas2.05Beta-6.5-x64.zip[/url]

From what i can see, it works as intended, but i'd be interested to hear if the rest of you agree.

monsted 2015-02-04 12:57

[QUOTE=monsted;394095]Hiya.

I've compiled a CUDALucas binary for my GTX970. If anyone wants to try it out, i've uploaded it to [url]http://monsted.dk/CUDALucas2.05Beta-6.5-x64.zip[/url]

From what i can see, it works as intended, but i'd be interested to hear if the rest of you agree.[/QUOTE]

Guess not. It reported no residue on a "doublecheck" work unit and marked it as a prime. Anyone able to make a functioning CUDA 6.5 compile? :)

flashjh 2015-02-04 13:00

I can make one, give me a bit, I'll post it when it's done.

wombatman 2015-02-04 14:20

1 Attachment(s)
Here's a 64-bit CUDA 7.0 version for Windows if you'd like to try it. You'll need the cufft dll (it's waaaay too big to attach here). The cudart dll is included.

flashjh 2015-02-04 16:34

[QUOTE=monsted;394411]Guess not. It reported no residue on a "doublecheck" work unit and marked it as a prime. Anyone able to make a functioning CUDA 6.5 compile? :)[/QUOTE]

2.05 Beta CUDA 6.5 x64 is [URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/"]here[/URL] (passed self test)

I can also build the CUDA 7.0 binaries, but 7.0 is still Release Candidate, so if you experience bugs...

CUDA Libs are [URL="https://sourceforge.net/projects/cudalucas/files/CUDA%20Libs/"]here[/URL]. I'll upload the 7.0 libs when 7.0 is final.

wombatman 2015-02-04 17:26

Flash, is that a windows build, and if so, what card and driver version are you using? I've never been able to continuously run CUDALucas on Windows--always run into that "Driver stopped responding and restarted" TdrDelay error that nobody can pin down.

kladner 2015-02-04 17:40

[QUOTE=wombatman;394431]Flash, is that a windows build, and if so, what card and driver version are you using? I've never been able to continuously run CUDALucas on Windows--always run into that "Driver stopped responding and restarted" TdrDelay error that nobody can pin down.[/QUOTE]

IIRC, people were working around that problem with a looping batch file which restarts CuLu.

flashjh 2015-02-04 18:54

Yes, the batch file works well. It's been a while but the driver issue popped up a while ago. Someone reported the problem to nVidia but they haven't done anything about it.

I tested this on GTX 580 with driver 347.12. I don't have it running full time right now but when I did a batch file always keeps it going.

Something simple:

[CODE]:Beginning
CUDALucas.exe
Goto Beginning[/CODE]

wombatman 2015-02-04 18:55

Yeah, I had used the batch file method too. Just wondered if there was a way to just have it run without crashing. Thanks!

flashjh 2015-02-05 02:40

[QUOTE=wombatman;394445]Yeah, I had used the batch file method too. Just wondered if there was a way to just have it run without crashing. Thanks![/QUOTE]

Take a look at [URL="http://www.mersenneforum.org/showthread.php?p=364436#post364436"]this[/URL]. It has been over a year since I did the testing and I don't know how CUDA 6.5 will do with everything. Either way, it really appears to be a driver issue.

Let me know what you find... :smile:

BTW - Yes, my binaries are for Windows.

wombatman 2015-02-05 04:46

Yeah, I'll try and run through paces. The little bit I've found so far (and you may already know all this) is that the error comes from the Timeout Detection and Recovery ([url]http://http.developer.nvidia.com/NsightVisualStudio/2.2/Documentation/UserGuide/HTML/Content/Timeout_Detection_Recovery.htm[/url]). So basically, if the GPU stops responding for 2 seconds (by default), Windows restarts the driver, which is the error we get.

You can increase the delay time either through the registry or using NSight Monitor (under Options/General). Increasing it to 20 seconds got me through the benchmark of -r 0 and up to 5000K on the cufftbench 1 8192 6. But I still hung and crashed. The error given is below:
[CODE]CUDALucas.cu(2366) : cudaSafeCall() Runtime API error 30: unknown error.
CUDALucas.cu(1049) : cudaSafeCall() Runtime API error 46: all CUDA-capable devices are busy or unavailable.[/CODE]

The first line, 2366, refers to: [CODE]cutilSafeCall (cudaEventRecord (stop, 0));
err = cutilSafeCall1(cudaEventSynchronize (stop));[/CODE]

The second, 1049, refers to: [CODE]cutilSafeCall (cudaMalloc ((void **) &g_x, sizeof (double) * n));//size_d));[/CODE] in alloc_gpu_mem function.

So maybe there is an issue with the synchronizing that causes a hang/error, and when you try to fix it by restarting the device, everything happens too quickly and you can't do the memory allocation? As you might imagine, I'm totally guessing here.


Edit: Also, it's worth mentioning that when I turned TDR off completely, I still got a hang from CUDALucas. So TDR is not directly responsible (I think) for the error. It's just what we see when the driver restarts. Also, the point at which CUDALucas hangs is inconsistent, even with the same command line parameters.

owftheevil 2015-02-05 12:47

Here's what I know about the bug.

It hangs during a cufft call.
It is specific to compute 2.0 cards.
It is most likely not a problem with cufft:
cuftt4.2 with Nvidia driver 295.?? works
cufft4.2 with >30?.?? drivers show the bug


In Linux, we can recover by resetting the device inside CUDALucas.


In Windows, the devices are deactivated after the timeout
so instead of continuing merrily on our way, we get that
memory allocation error you are seeing. CUDALucas needs
to be restarted to continue.
[COLOR=#000000][FONT=sans-serif]
[/FONT][/COLOR]

monsted 2015-02-05 13:23

[QUOTE=flashjh;394429]2.05 Beta CUDA 6.5 x64 is [URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/"]here[/URL] (passed self test)

I can also build the CUDA 7.0 binaries, but 7.0 is still Release Candidate, so if you experience bugs...

CUDA Libs are [URL="https://sourceforge.net/projects/cudalucas/files/CUDA%20Libs/"]here[/URL]. I'll upload the 7.0 libs when 7.0 is final.[/QUOTE]

Thanks! Reworking that doublecheck now (38 hours to do M38635771,71,1 on a GTX970).

wombatman 2015-02-05 14:17

[QUOTE=owftheevil;394530]Here's what I know about the bug.

It hangs during a cufft call.
It is specific to compute 2.0 cards.
It is most likely not a problem with cufft:
cuftt4.2 with Nvidia driver 295.?? works
cufft4.2 with >30?.?? drivers show the bug


In Linux, we can recover by resetting the device inside CUDALucas.


In Windows, the devices are deactivated after the timeout
so instead of continuing merrily on our way, we get that
memory allocation error you are seeing. CUDALucas needs
to be restarted to continue.
[COLOR=#000000][FONT=sans-serif]
[/FONT][/COLOR][/QUOTE]

How odd that it is specific to 2.0 cards (which is what I have). For what it's worth, I started a CUDALucas run overnight (M65911957 with FFT size of 3584K), and at least as far back as I can see, which is around 2-3 hours, it has not errored out. So maybe increasing the TdrDelay registry value helps?

flashjh 2015-02-05 14:44

[QUOTE=wombatman;394540]How odd that it is specific to 2.0 cards (which is what I have). For what it's worth, I started a CUDALucas run overnight (M65911957 with FFT size of 3584K), and at least as far back as I can see, which is around 2-3 hours, it has not errored out. So maybe increasing the TdrDelay registry value helps?[/QUOTE]

If you use this batch file it will count the restarts and put the number in the title of the window. You can also send it to a log, if you want.
[CODE]
@echo off
Set count=0
Set program=CUDALucas2.05Beta-CUDA5.0-Win32-r60
:loop
[LEFT]TITLE %program% Current Reset Count = %count%
[/LEFT]
Set /A count+=1
rem echo %count% >> log.txt
rem echo %count%
%program%.exe
GOTO loop
[/CODE]For what it's worth, I did a lot of testing and found that the restart problem, though irritating, didn't affect the results. So once you get it going, you should be ok. It was a hassle when trying to setup the cufftbench though.

flashjh 2015-02-05 15:07

[STRIKE][QUOTE=owftheevil;394530]
It is specific to compute 2.0 cards.[/QUOTE]

The 970 is [URL="http://en.wikipedia.org/wiki/CUDA#Supported_GPUs"]CC 5.2[/URL], should it be affected?

@wombatman, do you want a CUDA 6.5, CC 5.2 only build to see it it's any faster?[/STRIKE]
Never mind... got people mixed up

Us old 2.0 card holders need to get with the times :smile:

wombatman 2015-02-05 15:47

I only have a CC 2.0 card, so I wouldn't be able to run it, unfortunately.

flashjh 2015-02-05 16:08

[QUOTE=monsted;394533]Thanks! Reworking that doublecheck now (38 hours to do M38635771,71,1 on a GTX970).[/QUOTE]

@monsted, do you want a CUDA 6.5, CC 5.2 only build to see it it's any faster?

I put them on [URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/"]SourceForge[/URL] for you, if you want to try.

monsted 2015-02-06 10:59

[QUOTE=flashjh;394554]@monsted, do you want a CUDA 6.5, CC 5.2 only build to see it it's any faster?

I put them on [URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/"]SourceForge[/URL] for you, if you want to try.[/QUOTE]
Tried it out, but it doesn't seem to have made any noticable difference.ms/It is just about 3.6080 with both binaries.

It does cut down the size of the binary, so i'm guessing it just doesn't carry the cores it wouldn't use anyway?

flashjh 2015-02-06 11:02

[QUOTE=monsted;394685]Tried it out, but it doesn't seem to have made any noticable difference.ms/It is just about 3.6080 with both binaries.

It does cut down the size of the binary, so i'm guessing it just doesn't carry the cores it wouldn't use anyway?[/QUOTE]

Yes this binary was built only for 5.2. I tried the self test on a 580, it runs but gives all 0 residues.

wombatman 2015-02-06 13:48

Just as a quick follow-up, increasing the TdrDelay from 8 secs to 20 secs definitely helps. I'm pretty sure (based on a few telltale signs, but without absolute certainty) that I still get a Timeout Recovery at some point, but they're far and few between.

flashjh 2015-02-06 14:37

1 Attachment(s)
[QUOTE=wombatman;394693]Just as a quick follow-up, increasing the TdrDelay from 8 secs to 20 secs definitely helps. I'm pretty sure (based on a few telltale signs, but without absolute certainty) that I still get a Timeout Recovery at some point, but they're far and few between.[/QUOTE]

Go into the registry and modify the key manually, change it to 128 (dec) see attached picture for the location. Restart the system and try again. Use the batch file I posted to track and see if it's still hanging or not.

Modify the "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers" TdrDelay DWORD

flashjh 2015-02-06 15:32

[QUOTE=TheJudger;383727]Is there a common benchmark for CUDALucas?
[/QUOTE]

owftheevil, I'm working on going through the old messages here to price together anything that needs to be done to get CUDALucas 2.05 out of beta. I'm trying to complete the README and came accross this post.

What is the best combination cufftbench and threadbench for a good benchmark & burn-in? I'd also like to include some memtest stuff.

wombatman 2015-02-06 15:49

[QUOTE=flashjh;394698]Go into the registry and modify the key manually, change it to 128 (dec) see attached picture for the location. Restart the system and try again. Use the batch file I posted to track and see if it's still hanging or not.

Modify the "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers" TdrDelay DWORD[/QUOTE]

I'll try that tonight and see how it does.

owftheevil 2015-02-06 20:32

A good common benchmark would be something along the lines of

[CODE]./CUDALucas -cufftbench 1024 8192 5[/CODE]

That should cover most of the lengths currently being used and give a good enough idea of the card's speed for comparisons.


The -threadbench option is intended for fine tuning the card for a few particular fft lengths that will be used often.

For a burn in, the quick residue test followed by a doublecheck of an exponent that is not too close to an fft boundary would be sufficient. The <Gpu> fft.txt file gives fft boundaries. If someone was too impatient for that, they should do at least the long residue check. Many hours of the memory test are prescribed if any errors show up.

flashjh 2015-02-06 21:29

So for new card\new setup burn-in, would you think this is sufficient to recommend for optimal setup and testing?:

[CODE]CUDALucas -cufftbench 1024 8192 5
CUDALucas -threadbench 1024 8192 5
CUDALucas -r 1
CUDALucas 6972593[/CODE]

That LL only takes about an hour on a 580, so most would be patient enough for that. Do you have a better exponent in mind?

flashjh 2015-02-06 22:31

I committed r72 to sourceforge.

owftheevil, can you take a last look at the README, CHANGELOG and the CUDALucas.ini files. The only change to code was for -h.

Let me know when you've looked over and updated for final release.

Thanks!

owftheevil 2015-02-06 23:02

I'll get to it this evening.

Just noticed the other post. The threadbench option needs another parameter, usually 0 or 1 at the end. The tests you propose sound good enough.

flashjh 2015-02-07 01:24

[QUOTE=owftheevil;394756]I'll get to it this evening.

Just noticed the other post. The threadbench option needs another parameter, usually 0 or 1 at the end. The tests you propose sound good enough.[/QUOTE]

Fixed

owftheevil 2015-02-07 01:53

A first reading of the README found two typos and some musings of mine I would like deleted. I'll go through it once again more carefully. Overall it looks very good.

flashjh 2015-02-07 02:49

On mersenne.ca ([url]http://www.mersenne.ca/cudalucas.php?sort=ghdpd[/url])

James requests the results of this command:

CUDALucas -info -cufftbench 1024 8192 1024 >> benchmark.txt

Now it should be this, right?: CUDALucas -cufftbench 1024 8192 5 >> benchmark.txt

[QUOTE=owftheevil;394770]A first reading of the README found two typos and some musings of mine I would like deleted. I'll go through it once again more carefully. Overall it looks very good.[/QUOTE]

When you're done with your changes and all, let me know :smile:

owftheevil 2015-02-07 02:57

Yes, it should be as you state. He's using the syntax for 2.03 which used the third parameter for the jump between fft values. I'll pm you in the morning about my suggestions on the README.

James Heinrich 2015-02-07 13:46

[QUOTE=flashjh;394771][url]http://www.mersenne.ca/cudalucas.php[/url]
CUDALucas -info -cufftbench 1024 8192 1024 >> benchmark.txt
Now it should be this, right?:
CUDALucas -cufftbench 1024 8192 5 >> benchmark.txt[/QUOTE]I have changed 1024 -> 5 per your suggestion.
I'm not familiar with CUDALucas (and the only computer I have access to at the moment is powered by i945GME), but I think the -info added some details about the GPU, has that been deprecated?

owftheevil 2015-02-07 14:15

Some card info is posted in the fft.txt file. The beginning of a typical such file looks like this:

[CODE]
Device GeForce GTX 570
Compatibility 2.0
clockRate (MHz) 1464
memClockRate (MHz) 1900

fft max exp ms/iter
1024 19535569 1.7232
1080 20580341 2.0214
1120 21325891 2.0459
1152 21921901 2.0549
1176 22368691 2.2586
[/CODE]The clockRate and memClockRate numbers are deceptive since they are pre-load numbers.

I would be glad to put in more info if it would be useful.

ET_ 2015-02-07 15:46

Is there a version 2.05 for Linux working with the GTX980?
Or should I wait for a 2.06 with more info as owftheevil just hinted?

Luigi

flashjh 2015-02-07 16:06

2.05 will work with Windows and Linux on the 980, should be out of beta today.

owftheevil 2015-02-07 16:43

@ET To compile with cc5.2, cuda7.0. is required. I can upload the binary to Sourceforge if you want it. Not having a cc5.2 card myself, it hasn't been tested.

flashjh 2015-02-07 16:51

Does the CC 5.0 CUDA 6.5 version not work on the 980?

I am able to compile 5.2 with CUDA 6.5 on Windows, does Linux not?

owftheevil 2015-02-07 17:09

Here's what I get when I try with 6.5:
[CODE]
[filbert@archfilbert trunk]$ make
/usr/local/cuda-6.5/bin/nvcc -O3 --generate-code arch=compute_52,code=sm_52 --compiler-options=-Wall -I/usr/local/cuda-6.5/include -c CUDALucas.cu
nvcc fatal : Unsupported gpu architecture 'compute_52'
Makefile:26: recipe for target 'CUDALucas.o' failed
make: *** [CUDALucas.o] Error 1
[/CODE]

flashjh 2015-02-07 17:30

[QUOTE=owftheevil;394819]Here's what I get when I try with 6.5:
[CODE]
[filbert@archfilbert trunk]$ make
/usr/local/cuda-6.5/bin/nvcc -O3 --generate-code arch=compute_52,code=sm_52 --compiler-options=-Wall -I/usr/local/cuda-6.5/include -c CUDALucas.cu
nvcc fatal : Unsupported gpu architecture 'compute_52'
Makefile:26: recipe for target 'CUDALucas.o' failed
make: *** [CUDALucas.o] Error 1
[/CODE][/QUOTE]

Interesting... I just tried again and it does work for Windows. I stand corrected, sorry. For Linux you'll need CUDA 7.0 but either way 2.05 will work with Windows and Linux.

owftheevil 2015-02-07 17:54

flashjh, you were also correct in that the 980 should work with the cc5.0 version. It just might not be optimal.

owftheevil 2015-02-07 18:09

Version built for cuda 4.2 - 7.0 and some quick benchmarks for comparisons of the different cuda versions on Linux:[FONT=sans-serif][COLOR=#000000]
[/COLOR][/FONT][COLOR=#000000][FONT=sans-serif][CODE]
[filbert@archfilbert trunk]$ ./CUDALucas-4.2 57885161
Using threads: square 256, splice 512.
Starting M57885161 fft length = 3136K
| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done |
| Feb 07 09:41:08 | M57885161 10000 0x76c27556683cd84d | 3136K 0.21764 2.2289 22.28s | 1:11:50:02 0.01% |
| Feb 07 09:41:30 | M57885161 20000 0xfd8e311d20ffe6ab | 3136K 0.22456 2.2291 22.29s | 1:11:49:46 0.03% |

[/CODE]
[CODE]
[filbert@archfilbert trunk]$ ./CUDALucas-5.0 57885161

Using threads: square 256, splice 512.
Starting M57885161 fft length = 3136K
| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done |
| Feb 07 09:42:51 | M57885161 10000 0x76c27556683cd84d | 3136K 0.25391 2.0388 20.38s | 1:08:46:35 0.01% |
| Feb 07 09:43:12 | M57885161 20000 0xfd8e311d20ffe6ab | 3136K 0.27148 2.0388 20.38s | 1:08:46:16 0.03% |

[/CODE]
[CODE]
[filbert@archfilbert trunk]$ ./CUDALucas-5.5 57885161

Using threads: square 256, splice 512.
Starting M57885161 fft length = 3136K
| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done |
| Feb 07 09:44:07 | M57885161 10000 0x76c27556683cd84d | 3136K 0.23438 2.0287 20.28s | 1:08:36:55 0.01% |
| Feb 07 09:44:27 | M57885161 20000 0xfd8e311d20ffe6ab | 3136K 0.23535 2.0309 20.30s | 1:08:37:38 0.03% |

[/CODE]
[CODE]
[filbert@archfilbert trunk]$ ./CUDALucas-6.0 57885161

Using threads: square 256, splice 512.
Starting M57885161 fft length = 3136K
| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done |
| Feb 07 09:45:23 | M57885161 10000 0x76c27556683cd84d | 3136K 0.26562 2.0886 20.88s | 1:09:34:40 0.01% |
| Feb 07 09:45:44 | M57885161 20000 0xfd8e311d20ffe6ab | 3136K 0.23828 2.0881 20.88s | 1:09:34:05 0.03% |

[/CODE]
[CODE]
[filbert@archfilbert trunk]$ ./CUDALucas-6.5 57885161

Using threads: square 256, splice 512.
Starting M57885161 fft length = 3136K
| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done |
| Feb 07 09:46:35 | M57885161 10000 0x76c27556683cd84d | 3136K 0.23828 2.0607 20.60s | 1:09:07:44 0.01% |
| Feb 07 09:46:56 | M57885161 20000 0xfd8e311d20ffe6ab | 3136K 0.24805 2.0604 20.60s | 1:09:07:17 0.03% |

[/CODE]
[CODE]
[filbert@archfilbert trunk]$ ./CUDALucas-7.0 57885161

Using threads: square 256, splice 512.
Starting M57885161 fft length = 3136K
| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done |
| Feb 07 09:47:44 | M57885161 10000 0x76c27556683cd84d | 3136K 0.25000 2.0633 20.63s | 1:09:10:14 0.01% |
| Feb 07 09:48:05 | M57885161 20000 0xfd8e311d20ffe6ab | 3136K 0.25000 2.0631 20.63s | 1:09:09:47 0.03% |

[/CODE]
[/FONT][/COLOR]

wombatman 2015-02-07 18:15

[QUOTE=flashjh;394698]Go into the registry and modify the key manually, change it to 128 (dec) see attached picture for the location. Restart the system and try again. Use the batch file I posted to track and see if it's still hanging or not.

Modify the "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers" TdrDelay DWORD[/QUOTE]

Did this last night with TdrDelay set to 20 (since I already had that and was running other things). It restarted 45 times(!) overnight, which is pretty much the opposite of what I was seeing running it under MinGW64. So I'm going to run it under MinGW64 and see how many times it restarts using a bash script.

flashjh 2015-02-07 18:26

[QUOTE=wombatman;394831]...which is pretty much the opposite of what I was seeing running it under MinGW64.[/QUOTE]
This is the 1st I've heard of MinGW64. What is it and how did you use it to run CUDALucas?

flashjh 2015-02-07 18:27

[QUOTE=owftheevil;394829]Version built for cuda 4.2 - 7.0 and some quick benchmarks for comparisons of the different cuda versions on Linux:[/QUOTE]

Awesome, I'll get several versions tested on Windows, as well. I am hesitant to release CUDA 7 versions until it's out of RC.

ET_ 2015-02-07 18:35

Many thanks to owftheevil and flashjh :bow:

Il will wait for the 7.0 and try to recompile under Linux when ready. I assume the source is on SourceForge.

Luigi

owftheevil 2015-02-07 18:49

[QUOTE=flashjh;394833]Awesome, I'll get several versions tested on Windows, as well. I am hesitant to release CUDA 7 versions until it's out of RC.[/QUOTE]
Agreed. That was a specific offer for Luigi.

wombatman 2015-02-07 19:14

[QUOTE=flashjh;394832]This is the 1st I've heard of MinGW64. What is it and how did you use it to run CUDALucas?[/QUOTE]

I'm no expert on it, but it's basically a Windows-based Linux-y environment. I've primarily used it to compile things like GMP, GMP-ECM, MSieve, and the like and then run them. At this point, there are a number of individual toolchain builds, but here's a main website for it: [url]http://mingw-w64.sourceforge.net/[/url]

flashjh 2015-02-08 03:41

Here are my timings with 2.05 on a GTX580

CUDA 5.5
[CODE]Using threads: square 256, splice 128.
Starting M57885161 fft length = 3136K
| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done |
| Feb 07 20:36:39 | M57885161 10000 0x76c27556683cd84d | 3136K 0.24805 4.7168 47.16s | 3:03:49:48 0.01% |
| Feb 07 20:37:26 | M57885161 20000 0xfd8e311d20ffe6ab | 3136K 0.23438 4.7092 47.09s | 3:03:45:21 0.03% |[/CODE]

CUDA 6.0
[CODE]Using threads: square 256, splice 128.
Starting M57885161 fft length = 3136K
| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done |
| Feb 07 20:32:02 | M57885161 10000 0x76c27556683cd84d | 3136K 0.25000 4.8228 48.22s | 3:05:32:02 0.01% |
| Feb 07 20:32:50 | M57885161 20000 0xfd8e311d20ffe6ab | 3136K 0.25000 4.8281 48.28s | 3:05:33:48 0.03% |[/CODE]

CUDA 6.5
[CODE]Starting M57885161 fft length = 3136K
| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done |
| Feb 07 20:34:24 | M57885161 10000 0x76c27556683cd84d | 3136K 0.22266 4.8867 48.86s | 3:06:33:43 0.01% |
| Feb 07 20:35:13 | M57885161 20000 0xfd8e311d20ffe6ab | 3136K 0.23682 4.8775 48.77s | 3:06:28:26 0.03% |[/CODE]

Your times are ½ mine, what card are you using?

wombatman 2015-02-08 04:40

[QUOTE=wombatman;394831]Did this last night with TdrDelay set to 20 (since I already had that and was running other things). It restarted 45 times(!) overnight, which is pretty much the opposite of what I was seeing running it under MinGW64. So I'm going to run it under MinGW64 and see how many times it restarts using a bash script.[/QUOTE]

@flash, I've been running CUDALucas under MinGW for at least 3 hours (probably more like 5 or 6), and I don't have a single restart. No idea what the difference is between the command prompt and MinGW, but MinGW with TdrDelay=20 is pretty much continuously stable. I'll be running CUDALucas overnight to get a longer term test (and will note exactly when I start and stop).

flashjh 2015-02-08 12:02

CUDALucas 2.05 Final Release
 
After many months and a lot of work CUDALucas 2.05 is released. You can find the binaries and the lib files [URL="https://sourceforge.net/projects/cudalucas/files/"]here[/URL] for Windows and Linux.

New features in 2.05:
[CODE] - RCB (round - carry - balance, ~1% decrease in iteration times)
- On-the-fly FFT selection. Keyboard driven or automatic if error level exceeds threshold
- Included GPU memtest and tools to automate FFT finetuning and thread selection
- Bit shift to prevent errors from producing similar results[/CODE]

Because of the time it took to get this to release, most of the code changes have been extensively tested, but before you go production, please allow the following tests to complete (should take a couple hours on a 580):

[CODE] CUDALucas -cufftbench 1024 8192 5
CUDALucas -threadbench 1024 8192 5 0
CUDALucas -r 1
CUDALucas 6972593[/CODE]

If all these tests won't complete, take a look at your card timings and also look at running a memtest on the card (see readme).
(if threadbench won't complete on 2.0 cards due to the driver issue, try changing the 0 to a 1 or skip that test)

You can also follow the instructions [URL="http://www.mersenne.ca/cudalucas.php"]here[/URL] and send updated data to mersenne.ca

If you have contributed to CUDALucas, please contact flashjh and owftheevil
on mersenneforum.org so we can add you to the list of developers. Please
include your contribution to CUDAlucas so it can be added to this README.

To automate CUDALucas on Windows you can use [url]http://mersenneforum.org/misfit/downloads/MISFIT-CULU/[/url]

Thanks to everyone for making this happen! Happy hunting!

mognuts 2015-02-08 13:40

Excellent work! Congratulations to all concerned. Is there a CUDA 5.0 version for 64bit windows? I can't see it in the list, although there is a 5.0 version for Linux on sourceforge. A CUDA 5.0 version would be useful because it avoids the crashing problem, when used with the appropriate driver. Also, can you confirm that this is revision r78?

mognuts

flashjh 2015-02-08 15:14

[QUOTE=mognuts;394900]Excellent work! Congratulations to all concerned. Is there a CUDA 5.0 version for 64bit windows? I can't see it in the list, although there is a 5.0 version for Linux on sourceforge. A CUDA 5.0 version would be useful because it avoids the crashing problem, when used with the appropriate driver. Also, can you confirm that this is revision r78?

mognuts[/QUOTE]
Thanks...
I uploaded the files. The 5.0 and 4.2 wouldn't compile without a small change.

The current revision is 79 and all is up to date.

If anyone finds anything during a run or you have suggestions for new changes, let us know!

flashjh 2015-02-11 13:09

CUDALucas 2.05.1 is posted to [URL="https://sourceforge.net/projects/cudalucas/files/"]sourceforge[/URL]. An error was discovered in the display output if ReportIterations=100 or 50 or 10. The error only caused the display 'Error' to stay at .25000. Actual results were not affected. I uploaded all windows versions as one file this time. You still need the .ini file.

If anything else is found, let us know.

LaurV 2015-02-11 17:11

Nice job!

MacFactor 2015-02-15 21:51

Thanks. I was able to use the benchmark pages at [url]http://www.mersenne.ca/cudalucas.php[/url] to pick out a couple of inexpensive cards, and have *finally* got mfactc up and running under Linux Mint (circumventing Nouveaux got to be a real saga in itself). Unfortunately ... when I try running CUDALucas on the same system I get the error message "no such file or folder" !! I can see the file with ls -lt and yes I do own it and have permission to read, write, and execute it. I can move/rename it, compress it, etc., but if I type the file name to run it as a command, I get "no such". If I type ./CUDA*64 the system even parses/expands the filename correctly, but still returns "no such file" !

owftheevil 2015-02-16 04:13

Its probably some silly thing you are overlooking. Some thoughts: Where did you extract the binaries to? What does the ./CUDA*64 expand to?

LaurV 2015-02-16 11:02

It may be a sys message trying ot tell you that it tries to open a library which is not found. Do you have it? IIRC mfactx needs only the runtime (cudart) but cudaLucas needs also FFT lib (cufft) (so you will need both libs). Try getting the right one for your system and put it in the same folder.

chris2be8 2015-02-16 16:38

Is it a binary file or a script? Please post output from:
file "filename"
ls -l "filename
(in both commands replace "filename" with the name of the file).
And output from trying to run it.

Chris

Mistejk 2015-02-19 16:51

I have a problem, when I run CUDALucas 1.2b with GTX 570 it runs fine for like ~30000 iterations and then my screen freezes. CUDALucas throws me the following error:
[CODE]CUDALucas.cu(695) : cudaSafeCall() Runtime API error 6: the launch timed out and was terminated.[/CODE]
Also, in the system tray I have the following message:
[CODE]Display driver NVIDIA Windows Kernel Mode Driver, Version 344.75 stopped responding and has successfully recovered.[/CODE]
I haven't overclocked my GPU. I am using Windows 8.1 x64. I am running CUDALucas as following:
[CODE]CUDALucas-2.03-cuda4.0-sm_20-x86-64.exe -t -c 1000 71050117[/CODE]

flashjh 2015-02-19 20:17

[QUOTE=Mistejk;395829]I have a problem...[/QUOTE]
Hello,

Please download the newest version from [URL="https://sourceforge.net/projects/cudalucas/"]here[/URL]. Once you have read thru the README and gotten setup, let us know if you're still having a problem.

Jerry

Mistejk 2015-02-20 14:04

[QUOTE=flashjh;395854]Hello,

Please download the newest version from [URL="https://sourceforge.net/projects/cudalucas/"]here[/URL]. Once you have read thru the README and gotten setup, let us know if you're still having a problem.

Jerry[/QUOTE]


Thanks, getting the newest version with CUDA 6.5 fixed the problem.

ET_ 2015-02-21 17:04

I just downloaded CUDALuca 2.05 for Linux (i.e. the multile executable folder) from sourceforge, but when I try to run it I get:

[code]
luigi@Moreware:~/luigi/CUDA/cudaLucas/CUDALucas-2.05$ ll
totale 2956
drwxrwxr-x 2 luigi luigi 4096 feb 21 17:56 ./
drwxr-xr-x 7 luigi luigi 4096 feb 21 17:55 ../
-rwxr-xr-x 1 luigi luigi 753136 feb 12 00:36 CUDALucas*
-rwxr-xr-x 1 luigi luigi 425256 feb 12 00:27 CUDALucas-2.05.1-CUDA4.2-linux-x86_64*
-rwxr-xr-x 1 luigi luigi 478576 feb 12 00:28 CUDALucas-2.05.1-CUDA5.0-linux-x86_64*
-rwxr-xr-x 1 luigi luigi 609904 feb 12 00:38 CUDALucas-2.05.1-CUDA5.5-linux-x86_64*
-rwxr-xr-x 1 luigi luigi 749040 feb 12 00:34 CUDALucas-2.05.1-CUDA6.0-linux-x86_64*

luigi@Moreware:~/luigi/CUDA/cudaLucas/CUDALucas-2.05$ ./CUDALucas
bash: ./CUDALucas: File o directory non esistente
[/code]

(file or directory not present)

Any hints?

Luigi

paulunderwood 2015-02-21 17:18

It is strange to me to have a file ending with *, but try with the star at the end. :smile:

ET_ 2015-02-21 17:28

[QUOTE=paulunderwood;395997]It is strange to me to have a file ending with *, but try with the star at the end. :smile:[/QUOTE]

No way this time The star visually marks the file as executable and is not part of the name :razz:

Luigi

Mark Rose 2015-02-21 17:30

.

ET_ 2015-02-21 17:32

[QUOTE=ET_;395998]No way this time The star visually marks the file as executable and is not part of the name :razz:

Luigi[/QUOTE]

MacFactor (previous page) had the same problem I have.

[code]
uigi@Moreware:~/luigi/CUDA/cudaLucas/CUDALucas-2.05$ file CUDALucas
CUDALucas: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=a8b4728865a4f5a480dd218c33fd85728a4914c3, not stripped
[/code]

paulunderwood 2015-02-21 18:15

What is the output of:

[CODE]uname -a[/CODE]

:question:

ET_ 2015-02-21 18:18

[QUOTE=paulunderwood;396006]What is the output of:

[CODE]uname -a[/CODE]

:question:[/QUOTE]

[code]
Linux Moreware 3.13.0-45-generic #74-Ubuntu SMP Tue Jan 13 19:36:28 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
[/code]

The libraries are present:

[code]
luigi@Moreware:~/luigi/CUDA/cudaLucas/CUDALucas-2.05$ ldd CUDALucas
linux-vdso.so.1 => (0x00007fffcf53c000)
libcufft.so.6.5 => /usr/local/cuda-6.5/lib64/libcufft.so.6.5 (0x00007f9324653000)
libcudart.so.6.5 => /usr/local/cuda-6.5/lib64/libcudart.so.6.5 (0x00007f9324403000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f93240e3000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f9323d1d000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f9323b19000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f93238fa000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f93236f2000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f93233ee000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f93231d7000)
/lib/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2 (0x00007f932707a000)
[/code]

Luigi

paulunderwood 2015-02-21 18:23

What does:

[CODE]$PATH[/CODE]

give :question:

ET_ 2015-02-21 18:25

[QUOTE=paulunderwood;396008]What does:

[CODE]$PATH[/CODE]

give :question:[/QUOTE]

[code]
luigi@Moreware:~/luigi/CUDA/cudaLucas/CUDALucas-2.05$ echo $PATH
/usr/local/cuda-6.5/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
luigi@Moreware:~/luigi/CUDA/cudaLucas/CUDALucas-2.05$ echo $LD_LIBRARY_PATH
/usr/local/cuda-6.5/lib64:/usr/local/lib:
luigi@Moreware:~/luigi/CUDA/cudaLucas/CUDALucas-2.05$
[/code]

and mfaktc is happily running. :smile:

Luigi

paulunderwood 2015-02-21 18:52

Maybe you need to do this:

[CODE]sudo apt-get install ia32-libs[/CODE]

:yzzyx:

ET_ 2015-02-21 18:58

[QUOTE=paulunderwood;396011]Maybe you need to do this:

[CODE]sudo apt-get install ia32-libs[/CODE]

:yzzyx:[/QUOTE]

Why? :unsure:

Luigi

paulunderwood 2015-02-21 19:02

"Why?" is a good question -- it fixed [URL="http://ubuntuforums.org/showthread.php?t=1054621&page=3"]this guy's problem[/URL] which seems similar to yours. :smile:

ET_ 2015-02-21 19:07

[QUOTE=paulunderwood;396014]"Why?" is a good question -- it fixed [URL="http://ubuntuforums.org/showthread.php?t=1054621&page=3"]this guy's problem[/URL] which seems similar to yours. :smile:[/QUOTE]

No candidates to install :no:

Luigi

owftheevil 2015-02-21 21:14

How much trouble would it be to compile the source locally? My build environment might be too much different than yours.

ET_ 2015-02-22 10:50

[QUOTE=owftheevil;396034]How much trouble would it be to compile the source locally? My build environment might be too much different than yours.[/QUOTE]

I've compiled other CUDA sources before, I suppose I can manage it with a sufficient makefile, thanks.

Luigi

diep 2015-02-22 12:10

Good Afternoon!

[url]http://www.anandtech.com/show/8069/nvidia-releases-geforce-gtx-titan-z[/url]

Ti 780 is 8x slower in double precision than Titan.
That's a hardware lobotomization. Of course bandwidth of the thing is great.

How comes in this table : [url]http://www.mersenne.ca/cudalucas.php[/url]
that the Geforce 780 looks fast at all and beats cards that with sureness have more double precision resources?

Was this a 780 where someone modified the chip and enabled the double precision resources, like they managed to modify it at tomshardware?

That is not the same sort of "geforce 780" you buy in the shop which is factor 8 slower there.
Is it fair to put it in the table like this?

axn 2015-02-22 12:31

I dont believe the 780/780 Ti numbers are from modified chips. CUDALucas computation is very sensitive to memory bandwidth. Titan has at least 6x DP FLOPS compared to a 580 and is only 2x as fast. Are you suggesting that even the 580 numbers are "modified"?

EDIT:- From the table here ([url]http://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#GeForce_900_Series[/url]), 980 Ti should be slightly faster than 780 Ti in CuLu

diep 2015-02-22 12:36

If it's just bandwidth dependant then that would explain it.

That's a junk FFT implementation of course.

A double precision floating point has 8 bytes.
So a Tesla K20x can deliver for example 1.31 Tflop double precision,
yet in terms of bandwidth you'll not even get 10% out of the card. Idemdito Titan.

And i am always worried if in benchmarks in the table only the single precision performance gets shown,
whereas FFT/DWT is a double precision excercise.

James Heinrich 2015-02-22 12:50

I haven't had any benchmarks submitted that deviate more than ~10% from expected according to my chart.

Of course, I've had very few benchmarks submitted. I would encourage anyone reading this thread to please submit new benchmarks using CUDALucas v2.05

ET_ 2015-02-22 14:36

[QUOTE=James Heinrich;396071]I haven't had any benchmarks submitted that deviate more than ~10% from expected according to my chart.

Of course, I've had very few benchmarks submitted. I would encourage anyone reading this thread to please submit new benchmarks using CUDALucas v2.05[/QUOTE]

I will send you mine on the GTX 980 as soon as I grab some code... :smile:

Luigi

axn 2015-02-22 16:00

[QUOTE=diep;396070]That's a junk FFT implementation of course. [/QUOTE]

FWIW, CuLu uses the cuFFT library provided by Nvidia itself for the FFT. The real problem, I think, is that Nvidia GPUs (pre maxwell) has very little L2 cache, and hence much more reliant on memory bandwidth.

In fact, we're seeing a similar thing on the CPU side as well - with the advent of AVX, we're seeing "large FFT" performance that scales almost perfectly with memory bandwidth. But thanks to the relatively large L3 caches, the effects are somewhat countered at lower FFT sizes.

tenethor 2015-02-25 20:02

V2.05.1 Fails all self tests
 
Hello all

Just got a new (to me) nvidia gtx 590 and am working on getting cudalucas running. After some fighting with drivers and toolkits I got it running but it fails every self test.

[CODE]Using threads: square 256, splice 128.
Starting self test M57885161 fft length = 3136K
Iteration 10000 / 57885161, 0x0000000000000000, 3136K, CUDALucas v2.05.1, error = 0.00000, real: 0:49, 4.9186 ms/iter
Expected residue [76c27556683cd84d] does not match actual residue [0000000000000000][/CODE]

Just an example. All twenty return the same residue of 0.

I did finally settle on nvidia driver 340.29 and CUDA toolkit 6.5

any suggestions?

Thanks

tenethor 2015-02-26 17:22

Well disregard that one. After a little more poking around I found in the make file where you have to set the compute version you want to compile for. Fixed that one and now we're cranking out LL tests.

flashjh 2015-02-26 18:06

I was going to post that it looked like you were missing the correct compute for your card.

Happy Hunting!

ET_ 2015-02-26 18:09

[QUOTE=ET_;396063]I've compiled other CUDA sources before, I suppose I can manage it with a sufficient makefile, thanks.

Luigi[/QUOTE]

bump :smile:

flashjh 2015-02-26 18:18

All the source code including makefiles are on sourceforge

[url]http://sourceforge.net/p/cudalucas/code/HEAD/tree/tags/v2.05.1-final/[/url]

owftheevil 2015-02-27 02:49

ET, just grab the files in the directory flashjh mentioned. Open the Makefile in a text editor. There are some instructions at the top of the file on how to set the two parameters necessary to shape the build for your system.For you,

[CODE]
CUDA = /usr/local/cuda-6.5
[/CODE]is probably appropriate, and on the CUFLAGS line,

[CODE]
CUFLAGS -O$(OptLevel) --generate-code arch=compute_35,code=sm_35 --compiler-options=-Wall -I$(CUINC)
[/CODE]

or

[CODE]
CUFLAGS - -O$(OptLevel) -generate-code arch=compute_50,code=sm_50 --compiler-options=-Wall -I$(CUINC
[/CODE]

Save the file, then run

[CODE]
make
[/CODE]

in the directory where all these files are located.

You won't be able to use arch=compute_52,code=sm_52 unless you have CUDA-7.0 installed.

MacFactor 2015-02-27 05:31

Well, I notice that the libcufft.so files are for CUDA 4.1, not 4.2 -- and the linux libs haven't been updated in two years. Anyone know where to find them ?

After installing the CUDAlib files for CUDALucas, I found that mfaktc now returns an error message, telling me I have the wrong CUDA version (4.1) and that I need the version used in compilation (4.2). What makes this strange is that I included the name of the folder (mfaktc_libs) in the path name in the mfaktc.conf file, and the 4.1 files are in a folder called CUDAlibs. I would have thought that the specified folder would be searched first, but evidently not. Simply by moving the 4.1 files into a subfolder I was able to get mfaktc running again. But it makes me suspect that I need the libcufft.so files for version 4.2. I notice the Windows files are much more recent.

I've been searching Nvidia's site for the .so files, but was working from a Windows computer and couldn't see what was in those linux .run files. Will login again from a Linux Mint system and see if Nvidia has the files I need. I sure don't want to run another CUDA installer, though -- that was a headache.

MacFactor 2015-02-27 05:35

It didn't do anything for me.

I'm trying to get more current libcufft.so files -- hope that will fix it.

owftheevil 2015-02-27 13:27

I'm currently uploading the cuda library files (4.2 - 6.5) to sourceforge. I have hesitated to do this before because the files are big and I have incredibly slow and unreliable internet. Best estimate is a probability of 0.5 that they are up there some time in the next 24 hours.

ET_ 2015-02-27 16:48

1 Attachment(s)
Thank you folks! :bow:

I'm happily testing my new CuLu executable; meanwhile, here are the benchmarks for James.

Luigi


All times are UTC. The time now is 13:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.