mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

kladner 2012-09-25 14:56

I don't know the average error at the beginning, but I can restart it and see what comes out.

At the moment it is still running the continuation with -t 0 and has passed the sticking point. I'll be happy to run any tests requested.

Update: [CODE]Iteration 25100000 M( 27278xxx )C, 0x45cc61216a1a3dce, n = 1440K, CUDALucas v2.04 Beta err = 0.2656 (8:41 real, 5.2105 ms/iter, ETA 3:02:22)[/CODE]

Dubslow 2012-09-25 15:02

[QUOTE=kladner;312747]I don't know the average error at the beginning, but I can restart it and see what comes out. [/quote]Thanks.
[QUOTE=kladner;312747]
At the moment it is still running the continuation with -t 0 and has passed the sticking point. I'll be happy to run any tests requested.[/QUOTE]
Since you asked for it... this isn't the first time a too-aggressive FFT length has been picked. If you can, please run a test of the dinky little program I posed in [URL="http://www.mersenneforum.org/showthread.php?p=306898#post306898"]this first discussion[/URL] of the issue. (Unfortunately, I can't compile it, so you'll have to ask flash or start playing around with MSVS -- it's a very simple program, and so should be quite a bit easier to compile than CUDALucas.)

Edit: Use this slight revision (MSVS is ancient and uses ancient rules). The discussion linked above is still worth a read though, IMO.
[code]#include <stdlib.h>
#include <stdio.h>
#include <string.h>

void print_time_from_seconds (int sec) // copied almost verbatim from CuLu source
{
if (sec > 3600)
{
printf ("%d", sec / 3600);
sec %= 3600;
printf (":%02d", sec / 60);
}
else
printf ("%d", sec / 60);
sec %= 60;
printf (":%02d\n", sec);
}

int main(int argc, char** argv) {
char* name, * newname;
int q, n, j, old, new;
long t;
double* x;
FILE* f;

if( argc < 4 || !argv[1] || !argv[2] || !argv[3] ) {
printf("First argument should be name of checkpoint file, second should be old FFT (full form, not K form), and third should be new FFT\n");
return -1;
}
name = argv[1]; old = atoi(argv[2]); new = atoi(argv[3]);
f = fopen(name, "rb"); // Ignore compiler warnings about "secure functions"
fread(&q, sizeof(int), 1, f);
fread(&n, sizeof(int), 1, f);
if( n != old) {
printf("Supplied old length doesn't match checkpoint's old length, aborting\n");
return -1;
}
fread(&j, sizeof(int), 1, f);
x = (double*) calloc(new, sizeof(double));
fread(x, sizeof(double), old, f);
fread(&t, sizeof(long), 1, f); // comment out this line for 2.03 save files
fclose(f);
printf("This is a checkpoint for exp = %d, n = %dK, iter = %d, and total time = %ld = ", q, n/1024, j, t);
print_time_from_seconds(t);
printf("Converting from FFT %d to FFT %d\n", old, new);
len = strlen(name)+1;
newname = calloc((len+=4), sizeof(char));
snprintf(newname, len, "%s.new", name);
f = fopen(newname, "wb");
fwrite(&q, sizeof(int), 1, f);
fwrite(&n, sizeof(int), 1, f);
fwrite(&j, sizeof(int), 1, f);
fwrite(x, sizeof(double), new, f);
fwrite(&t, sizeof(long), 1, f); // comment this out for 2.03 save files
fclose(f);
printf("Written new checkpoint.\n")
return 127;
}[/code]
[code]bill@Gravemind:~/CUDALucas∰∂ ckpconvert t27812929 1572864 1638400
This is a checkpoint for exp = 27812929, n = 1536K, iter = 140001, and total time = 869 = 14:29
Converting from FFT 1572864 to FFT 1638400
Written new checkpoint.[/code]

kladner 2012-09-25 15:17

[QUOTE]If you can, please run a test of the dinky little program[/QUOTE]I'm afraid that's a bit out of my depth (compiling).

Flash, if you're watching this can you help?

Give me a few minutes and I'll recreate the beginning info for the exponent.

EDIT: [CODE]Starting M27278xxx fft length = 1440K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration 100, average error = 0.17082, max error = 0.24316
Iteration 200, average error = 0.19356, max error = 0.22656
Iteration 300, average error = 0.20363, max error = 0.25000
Iteration 400, average error = 0.20928, max error = 0.24316
Iteration 500, average error = 0.21295, max error = 0.27344
Iteration 600, average error = 0.21491, max error = 0.24609
Iteration 700, average error = 0.21601, max error = 0.24609
Iteration 800, average error = 0.21788, max error = 0.27344
Iteration 900, average error = 0.21798, max error = 0.23438
Iteration 1000, average error = 0.21805 < 0.25 (max error = 0.23438), continuing test.[/CODE]

flashjh 2012-09-25 18:34

1 Attachment(s)
I compiled the program, but it doesn't work for me. I get the right output, but the checkpoint still contains the 'old' FFT length (I tried running it in CuLu 2.04beta) and it still used the old FFT, so then I tried converting from new to old and it doesn't work, see below).

[CODE]c:\CUDA\ck>ck c27232109 1572864 1638400
This is a checkpoint for exp = 27232109, n = 1536K, iter = 2272301, and total time = 52909 = 14:41:49
Converting from FFT 1572864 to FFT 1638400
c27232109.new
Written new checkpoint.

c:\CUDA\ck>ck c27232109.new 1638400 1572864
Supplied old length doesn't match checkpoint's old length, aborting[/CODE]

Any ideas Dubslow, I would look at it more, but I have to get back to work for now?

Code I used is attached.

kladner 2012-09-25 19:02

Thanks Jerry! :smile:

Here are my latest results. M27278527 completed and matched the previous test, so I submitted it. I caught it just after the next exponent had run the 1000 iterations.
[CODE]For 27278527
Iteration 1000, average error = 0.21805 < 0.25 (max error = 0.23438), continuing test.

For 27278xxx
Iteration 1000, average error = 0.22508 < 0.25 (max error = 0.26563), continuing test.[/CODE]Since the average and max errors for the latter are higher than with 27278527, I ran [CODE]-cufftbench 32768 3276800 32768[/CODE]I looked at the results, and the next larger efficient FFT is 1536K. I put that on the worktodo line as instructed in CUDALucas.ini like this- [CODE]DoubleCheck=[KEY],27278xxx,1536K[/CODE]
This yielded-
[CODE]Iteration 1000, average error = 0.04833 < 0.25 (max error = 0.05371), continuing test.[/CODE]

It has not run long enough to determine the timing, but might be a bit slower than 1440K.

Dubslow 2012-09-25 19:23

[QUOTE=flashjh;312755]I compiled the program, but it doesn't work for me. I get the right output, but the checkpoint still contains the 'old' FFT length (I tried running it in CuLu 2.04beta) and it still used the old FFT, so then I tried converting from new to old and it doesn't work, see below).

[CODE]c:\CUDA\ck>ck c27232109 1572864 1638400
This is a checkpoint for exp = 27232109, n = 1536K, iter = 2272301, and total time = 52909 = 14:41:49
Converting from FFT 1572864 to FFT 1638400
c27232109.new
Written new checkpoint.

c:\CUDA\ck>ck c27232109.new 1638400 1572864
Supplied old length doesn't match checkpoint's old length, aborting[/CODE]

Any ideas Dubslow, I would look at it more, but I have to get back to work for now?

Code I used is attached.[/QUOTE]
:doh!:

Line 56: "fwrite(&n, sizeof(int), 1, f);" should be "fwrite(&new, sizeof(int), 1, f);".

:davieddy:

kladner 2012-09-25 20:12

For very similar exponents, 1536K is ~0.34 ms slower (94%) than 1440K on a GTX 460.

kladner 2012-09-25 22:12

[QUOTE=kladner;312759]For very similar exponents, 1536K is ~0.34 ms slower (94%) than 1440K on a GTX 460.[/QUOTE]

I have to walk this back. 1536K now seems to be about 7% faster. I'm not sure why the difference, though it is after a reboot.

Dubslow 2012-09-25 23:31

[QUOTE=kladner;312768]I have to walk this back. 1536K now seems to be about 7% faster. I'm not sure why the difference, though it is after a reboot.[/QUOTE]

:huh:

I was not expecting that.

flashjh 2012-09-25 23:48

[QUOTE=Dubslow;312775]:huh:

I was not expecting that.[/QUOTE]

Some testing still needs to be done, but LaurV put together a list of FFTs that perform better [URL="http://www.mersenneforum.org/showthread.php?p=310136#post310136"]here[/URL].

It may be worth while to do testing on your 460 and see if the results match.

kladner 2012-09-26 00:21

1 Attachment(s)
[QUOTE=flashjh;312777]Some testing still needs to be done, but LaurV put together a list of FFTs that perform better [URL="http://www.mersenneforum.org/showthread.php?p=310136#post310136"]here[/URL].

It may be worth while to do testing on your 460 and see if the results match.[/QUOTE]

I don't know why the timing went down. The previous expo was getting ~5.2433 ms/iter, while the current one is doing ~4.8614 ms/iter. They are the same for at least the first five digits. Interestingly, 1536K isn't on LaurV's list.

EDIT: The results of cufftbench are attached.


All times are UTC. The time now is 23:14.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.