mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

LaurV 2012-03-19 03:27

You guys (msft, Prime95, flashjh) are brilliant! Love you!

If you can add interactive change of aggressive_f variable (as I tried to explain in previous post, don't know if I succeed) I would love you even more! Any key combination or external ini file (read every time when checkpoints are saved) will do it. And if I find a prime with CL, I swear we split the bill :D

Eagerly waiting for binaries....

Dubslow 2012-03-19 03:35

[QUOTE=LaurV;293437]You guys (msft, Prime95, flashjh) are brilliant! Love you!

If you can add interactive change of aggressive_f variable (as I tried to explain in previous post, don't know if I succeed) I would love you even more! Any key combination or external ini file (read every time when checkpoints are saved) will do it. And if I find a prime with CL, I swear we split the bill :D

Eagerly waiting for binaries....[/QUOTE]
Hack the Prime95 .txt parser? I don't think my skills are quite up to snuff yet, but in a month or two, maybe.

Alternately, see Craig's bash script from before where you can just modify the options in the file, and then just execute the file, but obviously this is not portable (not to mention the extra file).

flashjh 2012-03-19 03:38

1 Attachment(s)
[QUOTE=msft;293434][code]
#ifdef _MSC_VER
#include <winsock2.h>
extern "C" int gettimeofday(struct timeval *tv, struct timezone *tz);
#else
#include <sys/time.h>
#include <unistd.h>
#endif
[/code]to
[code]
#ifdef _MSC_VER
typedef struct timeval
{
long tv_sec;
long tv_usec;
} timeval;
int gettimeofday (struct timeval *tv, struct timezone *);
#else
#include <sys/time.h>
#include <unistd.h>
#endif
[/code]I guess fix.
Thanks.[/QUOTE]
Yes, I had found it, and that was it. What is 'extern "C"'?




Anyway, attached v1.67 x64 binaries (untested): [LIST][*]CUDA 4.0 / SM 2.0[*]CUDA 4.1 / SM 2.0[*]CUDA 4.1 / SM 2.1[/LIST]@msft: I will work on CUDA 3.2 in a while. I have to install VS2008 before I can compile. Do you want 64 or 32 bit and what SM?

Prime95 2012-03-19 03:46

[QUOTE=flashjh;293441]Yes, I had found it, but that was certianly it. What is 'extern "C"'?[/QUOTE]

I added extern "C" to make MSVC 2010 happy. Extern "C" overrides name-mangling.

Is the new version faster for you? Does it work OK?

msft 2012-03-19 03:59

[QUOTE=flashjh;293441]@msft: I will work on CUDA 3.2 in a while. I have to install VS2008 before I can compile. Do you want 64 or 32 bit and what SM?[/QUOTE]
I believe 64 bit & SM1.3 is best.

flashjh 2012-03-19 04:06

[QUOTE=Prime95;293442]I added extern "C" to make MSVC 2010 happy. Extern "C" overrides name-mangling.

Is the new version faster for you? Does it work OK?[/QUOTE]
I haven't tested yet. I'm compiling remote and I can't stop CUDALucas remote becuase it won't restart (I doesn't detect the video card when I remote in). I'll start it in the morning and let you know. Thanks for your work on this.

[QUOTE=msft;293444]I believe 64 bit & SM1.3 is best.[/QUOTE]
I'll work on it, hopefully I'll have something in the morning :smile:

kladner 2012-03-19 04:18

Thanks to all the people who do real development work: msft, Prime95, flashjh, apsen..... (please forgive omissions.)

I am just a button pusher. Your work makes it possible for me to contribute.

msft 2012-03-19 04:32

[QUOTE=LaurV;293437]If you can add interactive change of aggressive_f variable (as I tried to explain in previous post, don't know if I succeed) I would love you even more! Any key combination or external ini file (read every time when checkpoints are saved) will do it. And if I find a prime with CL, I swear we split the bill :D
[/QUOTE]
Menu mode ?
[code]
$ ./CUDALucas 756839 -m
DEVICE:0------------------------
name GeForce GTX 460
Iteration 10000 M( 756839 )C, 0x5d2cbe7cb24a109a, n = 65536, CUDALucas v1.xx err = 2.686e-06 (0:04 real, 0.3998 ms/iter, ETA 4:55)
[B]
Ctr-C
[/B]
Menu.

1. Write checkpoint/Exit
2. GPU agressive/Continue
3. GPU polite/Continue
4. Continue

Your choice: [B]3[/B]

GPU polite/Continue

Iteration 20000 M( 756839 )C, 0x5d2cbe7cb24a109a, n = 65536, CUDALucas v1.xx err = 2.686e-06 (0:04 real, 0.3998 ms/iter, ETA 4:55)
...
[/code]

Karl M Johnson 2012-03-19 05:46

Cant verify that new version works on lowest exponent.

[CODE]
CUDALucas.exe -d 1 -threads 512 -c 250000 -t -agressive 6972593
DEVICE:1------------------------
name GeForce GTX 480
totalGlobalMem 1610612736
sharedMemPerBlock 49152
regsPerBlock 32768
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 65535,65535,65535
totalConstMem 65536
major.minor 2.0
clockRate 1640000
textureAlignment 512
deviceOverlap 1
multiProcessorCount 15

start M6972593 fft length = 393216
Iteration 250000 M( 6972593 )C, 0x35380a283f796d25, n = 393216, CUDALucas v1.67 err = 0.05391 (4:59 real, 1.1974 ms/iter, ETA 2:09:43)
Iteration 500000 M( 6972593 )C, 0x352d2af55f663b4b, n = 393216, CUDALucas v1.67 err = 0.05436 (5:05 real, 1.2168 ms/iter, ETA 2:06:45)
err = 0.496404, increasing n from 393216

start M6972593 fft length = 393216
err = 0.475974, increasing n from 393216

start M6972593 fft length = 458752
err = 0.491302, increasing n from 458752

start M6972593 fft length = 458752
err = 0.502144, increasing n from 458752

start M6972593 fft length = 491520
err = 0.497817, increasing n from 491520

start M6972593 fft length = 524288
Iteration 250000 M( 6972593 )C, 0xf1d9662d06b8d174, n = 524288, CUDALucas v1.67 err = 0.0001439 (6:29 real, 1.5593 ms/iter, ETA 2:48:55)
err = 0.490387, increasing n from 524288

start M6972593 fft length = 589824
err = 0.49439, increasing n from 589824

start M6972593 fft length = 589824
err = 0.49206, increasing n from 589824

start M6972593 fft length = 655360
err = 0.70221, increasing n from 655360

start M6972593 fft length = 786432
err = 2.2281, increasing n from 786432

start M6972593 fft length = 786432
err = 1.61712, increasing n from 786432

start M6972593 fft length = 917504
err = 31.9002, increasing n from 917504

start M6972593 fft length = 1048576
err = 7.12894, increasing n from 1048576

start M6972593 fft length = 1179648
err = 73.7854, increasing n from 1179648

start M6972593 fft length = 1572864
err = 325.864, increasing n from 1572864

start M6972593 fft length = 1835008
err = 1.84836, increasing n from 1835008

start M6972593 fft length = 2359296
err = 0.499349, increasing n from 2359296

start M6972593 fft length = 3670016
err = 58.862, increasing n from 3670016

start M6972593 fft length = 7340032
err = 2019.66, increasing n from 7340032

****APP CRASHES HERE****
[/CODE]

LaurV 2012-03-19 05:47

@msft: You sir made my day today!

P.s. I have another 3 matches with v1.65, in 26M range, this makes the score 11 to 2. The two mismatches were definitively hardware errors on my side, as a re-test showed. The tests were done with default FFT size in the beginning, and later with lower FFT size (I don't know exactly when I switched, maybe the last 5 or 6 DC tests). With this said, I consider CudaLucas v1.65 and higher, a very reliable tool, assuming that:

- [B]you do not overclock[/B]!
- you do not go too low with the size of the FFT (always use default, or stay where the error is not higher then 0.15. Lower FFT, i.e. higher error, like 0.22+, is dangerrous, you will get eventual abort of testing when the error increase on some particular iteration over 0.45, and will loose time by repeating last iterations with higher FFT.
- always use -t, this will make a bit slower, but more reliable, you can avoid retesting large areas of iterations as -t will spot the hardware errors at once.
- you can compensate the speed of -t by lowering the FFT size a bit, till the errors go around 0.15 (from a default of 0.07 or 0.09 for default FFT).
- always use -s. If you are worried about disk space, use a larger -c, like every 100k, 250k, 400k, 1 million, etc. iterations or so. No matter what cards you have (I tested [B]GTX580[/B], but also [B]Tesla c2050[/B]) you [B]WILL have[/B] occasional hardware errors, and then, it will be more convenient to repeat last million of iterations for example, than to repeat all the test from the beginning. When you have a match, delete the backup folder.

msft 2012-03-19 06:16

[QUOTE=Karl M Johnson;293457]Cant verify that new version works on lowest exponent.
[/QUOTE]
Please try.
./CUDALucas -d 1 -r
./CUDALucas -d 1 -threads -r

#Last weekend,I tried to install AMD OpenCL and destroyed CUDA development enviroment.:smile:


All times are UTC. The time now is 23:12.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.