![]() |
I don't know the average error at the beginning, but I can restart it and see what comes out.
At the moment it is still running the continuation with -t 0 and has passed the sticking point. I'll be happy to run any tests requested. Update: [CODE]Iteration 25100000 M( 27278xxx )C, 0x45cc61216a1a3dce, n = 1440K, CUDALucas v2.04 Beta err = 0.2656 (8:41 real, 5.2105 ms/iter, ETA 3:02:22)[/CODE] |
[QUOTE=kladner;312747]I don't know the average error at the beginning, but I can restart it and see what comes out. [/quote]Thanks.
[QUOTE=kladner;312747] At the moment it is still running the continuation with -t 0 and has passed the sticking point. I'll be happy to run any tests requested.[/QUOTE] Since you asked for it... this isn't the first time a too-aggressive FFT length has been picked. If you can, please run a test of the dinky little program I posed in [URL="http://www.mersenneforum.org/showthread.php?p=306898#post306898"]this first discussion[/URL] of the issue. (Unfortunately, I can't compile it, so you'll have to ask flash or start playing around with MSVS -- it's a very simple program, and so should be quite a bit easier to compile than CUDALucas.) Edit: Use this slight revision (MSVS is ancient and uses ancient rules). The discussion linked above is still worth a read though, IMO. [code]#include <stdlib.h> #include <stdio.h> #include <string.h> void print_time_from_seconds (int sec) // copied almost verbatim from CuLu source { if (sec > 3600) { printf ("%d", sec / 3600); sec %= 3600; printf (":%02d", sec / 60); } else printf ("%d", sec / 60); sec %= 60; printf (":%02d\n", sec); } int main(int argc, char** argv) { char* name, * newname; int q, n, j, old, new; long t; double* x; FILE* f; if( argc < 4 || !argv[1] || !argv[2] || !argv[3] ) { printf("First argument should be name of checkpoint file, second should be old FFT (full form, not K form), and third should be new FFT\n"); return -1; } name = argv[1]; old = atoi(argv[2]); new = atoi(argv[3]); f = fopen(name, "rb"); // Ignore compiler warnings about "secure functions" fread(&q, sizeof(int), 1, f); fread(&n, sizeof(int), 1, f); if( n != old) { printf("Supplied old length doesn't match checkpoint's old length, aborting\n"); return -1; } fread(&j, sizeof(int), 1, f); x = (double*) calloc(new, sizeof(double)); fread(x, sizeof(double), old, f); fread(&t, sizeof(long), 1, f); // comment out this line for 2.03 save files fclose(f); printf("This is a checkpoint for exp = %d, n = %dK, iter = %d, and total time = %ld = ", q, n/1024, j, t); print_time_from_seconds(t); printf("Converting from FFT %d to FFT %d\n", old, new); len = strlen(name)+1; newname = calloc((len+=4), sizeof(char)); snprintf(newname, len, "%s.new", name); f = fopen(newname, "wb"); fwrite(&q, sizeof(int), 1, f); fwrite(&n, sizeof(int), 1, f); fwrite(&j, sizeof(int), 1, f); fwrite(x, sizeof(double), new, f); fwrite(&t, sizeof(long), 1, f); // comment this out for 2.03 save files fclose(f); printf("Written new checkpoint.\n") return 127; }[/code] [code]bill@Gravemind:~/CUDALucas∰∂ ckpconvert t27812929 1572864 1638400 This is a checkpoint for exp = 27812929, n = 1536K, iter = 140001, and total time = 869 = 14:29 Converting from FFT 1572864 to FFT 1638400 Written new checkpoint.[/code] |
[QUOTE]If you can, please run a test of the dinky little program[/QUOTE]I'm afraid that's a bit out of my depth (compiling).
Flash, if you're watching this can you help? Give me a few minutes and I'll recreate the beginning info for the exponent. EDIT: [CODE]Starting M27278xxx fft length = 1440K Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length. Iteration 100, average error = 0.17082, max error = 0.24316 Iteration 200, average error = 0.19356, max error = 0.22656 Iteration 300, average error = 0.20363, max error = 0.25000 Iteration 400, average error = 0.20928, max error = 0.24316 Iteration 500, average error = 0.21295, max error = 0.27344 Iteration 600, average error = 0.21491, max error = 0.24609 Iteration 700, average error = 0.21601, max error = 0.24609 Iteration 800, average error = 0.21788, max error = 0.27344 Iteration 900, average error = 0.21798, max error = 0.23438 Iteration 1000, average error = 0.21805 < 0.25 (max error = 0.23438), continuing test.[/CODE] |
1 Attachment(s)
I compiled the program, but it doesn't work for me. I get the right output, but the checkpoint still contains the 'old' FFT length (I tried running it in CuLu 2.04beta) and it still used the old FFT, so then I tried converting from new to old and it doesn't work, see below).
[CODE]c:\CUDA\ck>ck c27232109 1572864 1638400 This is a checkpoint for exp = 27232109, n = 1536K, iter = 2272301, and total time = 52909 = 14:41:49 Converting from FFT 1572864 to FFT 1638400 c27232109.new Written new checkpoint. c:\CUDA\ck>ck c27232109.new 1638400 1572864 Supplied old length doesn't match checkpoint's old length, aborting[/CODE] Any ideas Dubslow, I would look at it more, but I have to get back to work for now? Code I used is attached. |
Thanks Jerry! :smile:
Here are my latest results. M27278527 completed and matched the previous test, so I submitted it. I caught it just after the next exponent had run the 1000 iterations. [CODE]For 27278527 Iteration 1000, average error = 0.21805 < 0.25 (max error = 0.23438), continuing test. For 27278xxx Iteration 1000, average error = 0.22508 < 0.25 (max error = 0.26563), continuing test.[/CODE]Since the average and max errors for the latter are higher than with 27278527, I ran [CODE]-cufftbench 32768 3276800 32768[/CODE]I looked at the results, and the next larger efficient FFT is 1536K. I put that on the worktodo line as instructed in CUDALucas.ini like this- [CODE]DoubleCheck=[KEY],27278xxx,1536K[/CODE] This yielded- [CODE]Iteration 1000, average error = 0.04833 < 0.25 (max error = 0.05371), continuing test.[/CODE] It has not run long enough to determine the timing, but might be a bit slower than 1440K. |
[QUOTE=flashjh;312755]I compiled the program, but it doesn't work for me. I get the right output, but the checkpoint still contains the 'old' FFT length (I tried running it in CuLu 2.04beta) and it still used the old FFT, so then I tried converting from new to old and it doesn't work, see below).
[CODE]c:\CUDA\ck>ck c27232109 1572864 1638400 This is a checkpoint for exp = 27232109, n = 1536K, iter = 2272301, and total time = 52909 = 14:41:49 Converting from FFT 1572864 to FFT 1638400 c27232109.new Written new checkpoint. c:\CUDA\ck>ck c27232109.new 1638400 1572864 Supplied old length doesn't match checkpoint's old length, aborting[/CODE] Any ideas Dubslow, I would look at it more, but I have to get back to work for now? Code I used is attached.[/QUOTE] :doh!: Line 56: "fwrite(&n, sizeof(int), 1, f);" should be "fwrite(&new, sizeof(int), 1, f);". :davieddy: |
For very similar exponents, 1536K is ~0.34 ms slower (94%) than 1440K on a GTX 460.
|
[QUOTE=kladner;312759]For very similar exponents, 1536K is ~0.34 ms slower (94%) than 1440K on a GTX 460.[/QUOTE]
I have to walk this back. 1536K now seems to be about 7% faster. I'm not sure why the difference, though it is after a reboot. |
[QUOTE=kladner;312768]I have to walk this back. 1536K now seems to be about 7% faster. I'm not sure why the difference, though it is after a reboot.[/QUOTE]
:huh: I was not expecting that. |
[QUOTE=Dubslow;312775]:huh:
I was not expecting that.[/QUOTE] Some testing still needs to be done, but LaurV put together a list of FFTs that perform better [URL="http://www.mersenneforum.org/showthread.php?p=310136#post310136"]here[/URL]. It may be worth while to do testing on your 460 and see if the results match. |
1 Attachment(s)
[QUOTE=flashjh;312777]Some testing still needs to be done, but LaurV put together a list of FFTs that perform better [URL="http://www.mersenneforum.org/showthread.php?p=310136#post310136"]here[/URL].
It may be worth while to do testing on your 460 and see if the results match.[/QUOTE] I don't know why the timing went down. The previous expo was getting ~5.2433 ms/iter, while the current one is doing ~4.8614 ms/iter. They are the same for at least the first five digits. Interestingly, 1536K isn't on LaurV's list. EDIT: The results of cufftbench are attached. |
| All times are UTC. The time now is 23:14. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.