![]() |
[QUOTE=Batalov;301099]I thought I was alone occasionally "being [strike]John Malkovich[/strike]".[/QUOTE]
Actually, I was channeling my "inner Mr. R. D. Silverman" when I wrote that. But any uptight pedantic would do.... :smile: |
With the contents of the file being:
[CODE] Test=N/A,216091 Test=N/A,756839 Test=N/A,500009 Test=N/A,859433 [/CODE] I got [CODE]Microsoft Windows [Version 6.1.7601] Copyright (c) 2009 Microsoft Corporation. All rights reserved. e:\CudaLucas>cl2024020x64 -d 0 ------- DEVICE 0 ------- name GeForce GTX 580 totalGlobalMem 1610612736 sharedMemPerBlock 49152 regsPerBlock 32768 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 1024 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 65535,65535,65535 totalConstMem 65536 Compatibility 2.0 clockRate (MHz) 1564 textureAlignment 512 deviceOverlap 1 multiProcessorCount 16 Warning: ignoring line 1 in "worktodo.txt"! Reason: invalid format Warning: ignoring line 2 in "worktodo.txt"! Reason: invalid format Warning: ignoring line 3 in "worktodo.txt"! Reason: invalid format Warning: ignoring line 4 in "worktodo.txt"! Reason: invalid format No valid assignment found.[/CODE] then adding a comma after the third line I got [CODE] e:\CudaLucas>cl2024020x64 -d 0 ------- DEVICE 0 ------- name GeForce GTX 580 totalGlobalMem 1610612736 sharedMemPerBlock 49152 regsPerBlock 32768 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 1024 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 65535,65535,65535 totalConstMem 65536 Compatibility 2.0 clockRate (MHz) 1564 textureAlignment 512 deviceOverlap 1 multiProcessorCount 16 Warning: ignoring line 1 in "worktodo.txt"! Reason: invalid format Warning: ignoring line 2 in "worktodo.txt"! Reason: invalid format[/CODE] and the crash follows in about 1-2 seconds |
[QUOTE=msft;295750][code]
$ ./CUDALucas -threads 512 332220523 DEVICE:0------------------------ name GeForce GTX 550 Ti totalGlobalMem 1072889856 ... start M332220523 fft length = 18874368 err = 0.35937, increasing n from 18874368 start M332220523 fft length = 18874368 err = 0.35937, increasing n from 18874368 start M332220523 fft length = 20971520 Iteration 10000 M( 332220523 )C, 0x1a313d709bfa6663, n = 20971520, CUDALucas v1.66 err = 0.03358 (22:30 real, 134.9292 ms/iter, ETA 12451:20:29) Iteration 20000 M( 332220523 )C, 0x73dc7a5c8b839081, n = 20971520, CUDALucas v1.66 err = 0.03358 (22:26 real, 134.5456 ms/iter, ETA 12415:34:17) [/code][/QUOTE] just started.... >CUDALucas2.01.cuda4.0.sm_21.x64.exe -threads 512 332XXXXXX start M332XXXXXX fft length = 16777216 iteration = 22 < 1000 && err = 0.28125 >= 0.25, increasing n from 16777216 start M332XXXXXX fft length = 18874368 iteration = 29 < 1000 && err = 0.265625 >= 0.25, increasing n from 18874368 start M332XXXXXX fft length = 20971520 Iteration 10000 M( 332XXXXXX )C, 0x9786a346ab86fc52, n = 20971520, CUDALucas v2.01 err = 0.03516 (11:02 real, 66.2000 ms/iter, ETA 6108:47:24) |
Debugging it on my end was useless...
1 Attachment(s)
[QUOTE=Dubslow;301178]I'll run it through with -O0 and see if anything appears there.[/QUOTE]
Piddly squat. I await flash's report. I'm stumped. (chalsall, for the sake of the software, could you please PM/tell flash what to fix so that it works?) |
You're calling strcpy_s with the wrong arguments. This is a problem on the strcopy call on line 230 specifically, and maybe other places. Note how the strcpy_s fails when the source is too long compared to strncpy
Also, you might want to change the vars which hold the exponent to unsigned values since you're filling them with strtoul. |
[QUOTE=Dubslow;301093]And it's not something that would make Linux crash...? Did you use a debugger, or just look at the code and find it?
[/QUOTE] If code behaves differently on Linux and Windows the first thing to check is that lines end with crlf on Windows and lf on Linux. Try running unix2dos against the config file on Linux and see what happens. Chris |
[QUOTE=kjaget;301210]You're calling strcpy_s with the wrong arguments. This is a problem on the strcopy call on line 230 specifically, and maybe other places. Note how the strcpy_s fails when the source is too long compared to strncpy[/QUOTE]
[url]http://msdn.microsoft.com/en-us/library/td1esda9(v=vs.80).aspx[/url] [code]errno_t strcpy_s( char *[U]strDestination[/U], size_t [U]numberOfElements[/U], const char *[U]strSource[/U] );[/code] [code]void strcopy(char* dest, char* src, size_t n) { #ifdef linux strncpy(dest, src, n); #else strcpy_s([U]dest[/U], [U]n[/U], [U]src[/U]); #endif }[/code] [QUOTE=chris2be8;301231]If code behaves differently on Linux and Windows the first thing to check is that lines end with crlf on Windows and lf on Linux. Try running unix2dos against the config file on Linux and see what happens. Chris[/QUOTE] Thanks, that's an excellent suggestion. I'll go do that now. Edit: Sadly, it still worked fine. I'll run it through the debugger though. [code]bill@Gravemind:~/CUDALucas∰∂ unix2dos worktodo.txt unix2dos: converting file worktodo.txt to DOS format ... bill@Gravemind:~/CUDALucas∰∂ CUDALucas Starting M25994671 fft length = 1474560[/code] |
got a big performance increase with 2.02. 4.1 with my gtx 680
on a 65xxxxxx exponent 2.01 gives me 7.6ms/iteration this version 6.6ms/itereation was that expected??? |
[QUOTE=Redarm;301290]got a big performance increase with 2.02. 4.1 with my gtx 680
on a 65xxxxxx exponent 2.01 gives me 7.6ms/iteration this version 6.6ms/itereation was that expected???[/QUOTE] I didn't change a thing math-wise. Perhaps you chose a better FFT length? Edit: Could be new drivers, the 680 is still rather new. Or, if previously you were using arch>=2.0 but are now using arch=1.3, you might see some performance gains. PS: Are you in Windows, and if so, does it crash for you like above? |
[QUOTE=Dubslow;301246][url]http://msdn.microsoft.com/en-us/library/td1esda9(v=vs.80).aspx[/url]
[code]errno_t strcpy_s( char *[U]strDestination[/U], size_t [U]numberOfElements[/U], const char *[U]strSource[/U] );[/code] [code]void strcopy(char* dest, char* src, size_t n) { #ifdef linux strncpy(dest, src, n); #else strcpy_s([U]dest[/U], [U]n[/U], [U]src[/U]); #endif }[/code][/QUOTE] Keep in mind that numberOfElements is the size of the destination buffer, not the number of characters you wish to copy as it is in strncpy(). [QUOTE]If strDestination or strSource is a null pointer, [B]or if the destination string is too smal[/B]l, the invalid parameter handler is invoked as described in Parameter Validation. [/QUOTE] What is the length of the destination string you're passing in from this call, and how long is the source string : strcopy(assignment->assignment_key,ptr,1+(strstr(ptr,",")-ptr) ); // copy the comma.. You can add a check in strcopy to see the problem in Linux. Print n and strlen(src) and then dereference a NULL pointer if n is smaller. You'll see a crash in basically the same place as you would on Windows. Or just use strncpy() in Windows as well as Linux, and add in the dest[n-1] = '\0'; line recommended earlier. |
1 Attachment(s)
[QUOTE=kjaget;301298]Keep in mind that numberOfElements is the size of the destination buffer, not the number of characters you wish to copy as it is in strncpy().
[/QUOTE] GAH Leave it to Microsoft to complicate everything [code]void strcopy(char* dest, char* src, size_t n) { #ifdef linux strncpy(dest, src, n); #else strncpy_s(dest, MAX_LINE_LENGTH+1, src, n); #endif }[/code] [QUOTE=kjaget;301298] You can add a check in strcopy to see the problem in Linux. Print n and strlen(src) and then dereference a NULL pointer if n is smaller. You'll see a crash in basically the same place as you would on Windows. [/QUOTE] Makes sense. Like you pointed out, I was thinking it would stop after n chars. Btw, sorry about being so snippety in my first reply :razz: For whatever reason in my version of the file that line is 269, not 23x, so I had no idea what you were talking about. Thanks though for the pointer (:lol:! ...On that note, it's quite fun to increment a char pointer named "C"... :smile:) Okay, let's test this ASAP. I've upped the version to 2.03, pending confirmation of bug destroyed. While this has been going on, I've still been fiddling with the code here and there, in platform-independent manners, and it works here, but of course testing still required. By the way, if this was in fact the bug, then something like "Test=25994671" should have worked just fine. Did anybody try something like that, or just a valid expo with >=2 commas? (Again, the included .ini should be shipped with all binaries :smile:) Edit: ToDo list: 1) Clean up FFT selection. Before the bug was reported I had made some progress in this front, and it should turn out easier than I had first thought. 2) To complement that, get exponent-by-exponent FFT specification ability, again probably via some field in the worktodo.txt line. Any ideas on how to actually do that? (The way Prime95 does it is kind of a pain in the butt to parse.) 3) If anyone would like a different way of determining whether or not there is an assignment key other than by comma count, please let me know. [SUP][SUP][SUP][SUP][SUP][SUP][SUP][SUP][SUP][SUP][SUP][SUP][SUP][SUP][SUP][COLOR="White"]4) Bit shift? 0_o[/COLOR][/SUP][/SUP][/SUP][/SUP][/SUP][/SUP][/SUP][/SUP][/SUP][/SUP][/SUP][/SUP][/SUP][/SUP][/SUP] |
[QUOTE=Dubslow;301300]
By the way, if this was in fact the bug, then something like "Test=25994671" should have worked just fine. Did anybody try something like that, or just a valid expo with >=2 commas? [/QUOTE] Indeed, after reading your post I changed the test file shown in my previous post (edit: post #1360), by eliminating the "N/A," stuff. Here is the result: [CODE]e:\CudaLucas>cl2024020x64 -d 0 ------- DEVICE 0 ------- <snip> Continuing work from a partial result of M216091 fft length = 12288 iteration = 110001 Iteration 120000 M( 216091 )C, 0x3981b56788b529e2, n = 12288, CUDALucas v2.02 err = 0.00415 (0:03 real, 0.2775 ms/iter, ETA 0:24) Iteration 130000 M( 216091 )C, 0x80438af231f8fccd, n = 12288, CUDALucas v2.02 err = 0.004822 (0:02 real, 0.2792 ms/iter, ETA 0:22) Iteration 140000 M( 216091 )C, 0x669382faea06df89, n = 12288, CUDALucas v2.02 err = 0.004822 (0:03 real, 0.2706 ms/iter, ETA 0:18)[/CODE] edit2: next cosmetic thing would be to use a fixed format (6, 5, even 4 decimals would be enough) for error display, and 4 decimals for ETA. I mean to display the trailing zeroes. It looks ugly (misaligned) when there are trailing zeroes, all row is shifted left. You can do that in v2.03, is not difficult. |
[QUOTE=LaurV;301303]Indeed, after reading your post I changed the test file shown in my previous post (edit: post #1360), by eliminating the "N/A," stuff. Here is the result:
[CODE]e:\CudaLucas>cl2024020x64 -d 0 ------- DEVICE 0 ------- <snip> Continuing work from a partial result of M216091 fft length = 12288 iteration = 110001 Iteration 120000 M( 216091 )C, 0x3981b56788b529e2, n = 12288, CUDALucas v2.02 err = 0.00415 (0:03 real, 0.2775 ms/iter, ETA 0:24) Iteration 130000 M( 216091 )C, 0x80438af231f8fccd, n = 12288, CUDALucas v2.02 err = 0.004822 (0:02 real, 0.2792 ms/iter, ETA 0:22) Iteration 140000 M( 216091 )C, 0x669382faea06df89, n = 12288, CUDALucas v2.02 err = 0.004822 (0:03 real, 0.2706 ms/iter, ETA 0:18)[/CODE] edit2: next cosmetic thing would be to use a fixed format (6, 5, even 4 decimals would be enough) for error display, and 4 decimals for ETA. I mean to display the trailing zeroes. It looks ugly (misaligned) when there are trailing zeroes, all row is shifted left. You can do that in v2.03, is not difficult.[/QUOTE] Yick, printf formatting. The current formatter is "%3.4g" for the error; if I understand correctly, "%4.4g" will create a minimum field with of 4, with a maximum width of 4 for g. Will test now, hopefully can edit into previous post. Edit: Well, that didn't work, but changing from g to f worked, so now the iteration time and round off share the same formatting of "%4.4f". Edited into previous post. [code]bill@Gravemind:~/CUDALucas∰∂ CUDALucas -t -k -c 200 Continuing work from a partial result of M25994671 fft length = 1474560 iteration = 11819502 Iteration 11819600 M( 25994671 )C, 0xda2bc6c08cb012d2, n = 1474560, CUDALucas v2.03 err = 0.0000 (0:02 real, 10.5765 ms/iter, ETA 41:38:42) Iteration 11819800 M( 25994671 )C, 0x7a57b790bf30c653, n = 1474560, CUDALucas v2.03 err = 0.0820 (0:02 real, 5.8363 ms/iter, ETA 22:58:49) Iteration 11820000 M( 25994671 )C, 0x5b7a72e76b58eff7, n = 1474560, CUDALucas v2.03 err = 0.0820 (0:01 real, 5.7476 ms/iter, ETA 22:37:50) Iteration 11820200 M( 25994671 )C, 0x803f4a72d574de89, n = 1474560, CUDALucas v2.03 err = 0.0820 (0:01 real, 5.7128 ms/iter, ETA 22:29:35) ^C caught. Writing checkpoint.[/code] Edit3: Well, I missed the ETA portion of your comments. That one's kind of harder, because the number of fields changes. Do you mean that with 25 minutes left, you want it to read "ETA 00:25:42", padded with the zeros? Voilà le current code. [code]//From apsen void print_time_from_seconds (int sec) { if (sec > 3600) { printf ("%d", sec / 3600); sec %= 3600; printf (":%02d", sec / 60); } else printf ("%d", sec / 60); sec %= 60; printf (":%02d", sec); }[/code] And what I think you mean: [code]printf("%02d", sec / 3600); sec %= 3600; printf(":%02d", sec / 60); printf(":%02d", sec % 60); /* No ifs, elses, thens, or buts about it */[/code] Edit2: If anyone else wants cosmetic or or other changes in CUDALucas 2.03, just be sure to request them here before flash comes online and compiles it :smile: (PS Thanks again Mr. Jaget :smile:) |
I may talked to early... I am getting random "caught. Writing checkpoint." without pressing any ctrl c (and the mesage is not "^C caught.", just "caught."), and in case I get it to the end without any interruption, it will always "cannot move tmp workfile to regular workfile" and crash. Subsequent launches can't find worktodo (which is now __woktodo).
|
[QUOTE=LaurV;301311]I may talked to early... I am getting random "caught. Writing checkpoint." without pressing any ctrl c (and the mesage is not "^C caught.", just "caught."), and in case I get it to the end without any interruption, it will always "cannot move tmp workfile to regular workfile" and crash. Subsequent launches can't find worktodo (which is now __woktodo).[/QUOTE]
Sheesh. That's a lot of crap. (That " caught..." is because "^C^C caught..." looked silly to me, but it looks silly now without the ^C...) I don't know what could possibly be signalling the program, but that's what it looks like. The way CUDALucas (and mfaktc, from which its derived) clear assignments is by writing all the assignments except the one to be cleared to "__worktodo__.tmp", and then overwrite the regular work file with that. [code] if(remove(filename) != 0) return CANT_RENAME; if(rename("__worktodo__.tmp", filename) != 0) return CANT_RENAME; return OK;[/code] [URL="http://pubs.opengroup.org/onlinepubs/009695399/functions/rename.html"]rename()[/URL] is a standard C library function. There's not much I can do to debug it. I hate to say it because it feels like ducking out, but that looks like OS issues...? I dunno. I can't do much about [URL="http://pubs.opengroup.org/onlinepubs/009695399/functions/signal.html"]signal()[/URL] and rename(). The signal() code is identical to previous versions of CUDALucas except for the string printed. [code]signal (SIGTERM, SetQuitting); signal (SIGINT, SetQuitting);[/code] |
[QUOTE=Dubslow;301314]
if(rename("__worktodo__.tmp", filename) != 0) [/QUOTE] well, the temp file is just "__worktodo.tmp". No "__" at the end. I did not look in the source yet (no time!) but are you sure you use the same name in both directions? :razz: (don't get me wrong, I really appreciate what you are doing and the fact that you are learning very fast, but if we continue like this, I may put my nose into the source code too, and fix it by myself, maybe I can find some time during the weekend). |
[QUOTE=LaurV;301315]well, the temp file is just "__worktodo.tmp". No "__" at the end. I did not look in the source yet (no time!) but are you sure you use the same name in both directions? :razz:
(don't get me wrong, I really appreciate what you are doing and the fact that you are learning very fast, but if we continue like this, I may put my nose into the source code too, and fix it by myself, maybe I can find some time during the weekend).[/QUOTE] Definitely sounds like an OS issue. I really did copy and paste as much of it as possible, only removing the parts referring to bit levels and whatnot. The rest is identical. The CUDALucas code I have: [code]/************************************************************************************************************ * Function name : clear_assignment * * * * INPUT : char *filename * * int exponent * * OUTPUT : * * * * 0 - OK * * 3 - clear_assignment : cannot open file <filename> * * 4 - clear_assignment : cannot open file "__worktodo__.tmp" * * 5 - clear_assignment : assignment not found * * 6 - clear_assignment : cannot rename temporary workfile to regular workfile * * * ************************************************************************************************************/ enum ASSIGNMENT_ERRORS clear_assignment(char *filename, int exponent) { int found = 0; FILE *f_in, *f_out; LINE_BUFFER line; // line buffer char *tail = NULL; // points to tail material in line, if non-null enum PARSE_WARNINGS value; unsigned int line_to_drop = UINT_MAX; unsigned int current_line; struct ASSIGNMENT assignment; // the found assignment.... f_in = _fopen(filename, "r"); if (NULL == f_in) return CANT_OPEN_WORKFILE; f_out = _fopen("__worktodo__.tmp", "w"); if (NULL == f_out) { fclose(f_in); return CANT_OPEN_TEMPFILE; } current_line =0; while (END_OF_FILE != (value = parse_worktodo_line(f_in,&assignment,&line,&tail)) ) { current_line++; if (NO_WARNING == value) { if( (exponent == assignment.exponent) ) // make final decision { if (line_to_drop > current_line) line_to_drop = current_line; break; } else { line_to_drop = current_line+1; // found different assignment, can drop no earlier than next line } } else if ((BLANK_LINE == value) && (UINT_MAX == line_to_drop)) line_to_drop = current_line+1; } errno = 0; if (fseek(f_in,0L,SEEK_SET)) { fclose(f_in); f_in = _fopen(filename, "r"); if (NULL == f_in) { fclose(f_out); return CANT_OPEN_WORKFILE; } } found = 0; current_line = 0; while (END_OF_FILE != (value = parse_worktodo_line(f_in,&assignment,&line,&tail)) ) { current_line++; if ((NO_WARNING != value) || found) { if ((found) || (current_line < line_to_drop)) fprintf(f_out, "%s", line); } else // assignment on the line, so we may need to print it.. { found = (exponent == assignment.exponent); if (!found) { fprintf(f_out,"%s",line); } else // we have the assignment... { // Do nothing; we don't print this to the temp file, 'cause we're trying to delete it :) } } } // while..... fclose(f_in); fclose(f_out); if (!found) return ASSIGNMENT_NOT_FOUND; if(remove(filename) != 0) return CANT_RENAME; if(rename("__worktodo__.tmp", filename) != 0) return CANT_RENAME; return OK; }[/code] And the analogous code in mfaktc-0.18: [code]/************************************************************************************************************ * Function name : clear_assignment * * * * INPUT : char *filename * * unsigned int exponent * * int bit_min - from old assignment file * * int bit_max * * int bit_min_new - new bit_min,what was factored to--if 0,reached bit_max * * OUTPUT : * * * * 0 - OK * * 3 - clear_assignment : cannot open file <filename> * * 4 - clear_assignment : cannot open file "__worktodo__.tmp" * * 5 - clear_assignment : assignment not found * * 6 - clear_assignment : cannot rename temporary workfile to regular workfile * * * * If bit_min_new is zero then the specified assignment will be cleared. If bit_min_new is greater than * * zero the specified assignment will be modified * ************************************************************************************************************/ enum ASSIGNMENT_ERRORS clear_assignment(char *filename, unsigned int exponent, int bit_min, int bit_max, int bit_min_new) { int found = FALSE; FILE *f_in, *f_out; LINE_BUFFER line; // line buffer char *tail = NULL; // points to tail material in line, if non-null enum PARSE_WARNINGS value; unsigned int line_to_drop = UINT_MAX; unsigned int current_line; struct ASSIGNMENT assignment; // the found assignment.... f_in = fopen(filename, "r"); if (NULL == f_in) return CANT_OPEN_WORKFILE; f_out = fopen("__worktodo__.tmp", "w"); if (NULL == f_out) { fclose(f_in); return CANT_OPEN_TEMPFILE; } if ((bit_min_new > bit_min) && (bit_min_new < bit_max)) // modify only line_to_drop = UINT_MAX; else { current_line =0; while (END_OF_FILE != (value = parse_worktodo_line(f_in,&assignment,&line,&tail)) ) { current_line++; if (NO_WARNING == value) { if( (exponent == assignment.exponent) && (bit_min == assignment.bit_min) && (bit_max == assignment.bit_max)) // make final decision { if (line_to_drop > current_line) line_to_drop = current_line; break; } else { line_to_drop = current_line+1; // found different assignment, can drop no earlier than next line } } else if ((BLANK_LINE == value) && (UINT_MAX == line_to_drop)) line_to_drop = current_line+1; } } errno = 0; if (fseek(f_in,0L,SEEK_SET)) { fclose(f_in); f_in = fopen(filename, "r"); if (NULL == f_in) { fclose(f_out); return CANT_OPEN_WORKFILE; } } found = FALSE; current_line = 0; while (END_OF_FILE != (value = parse_worktodo_line(f_in,&assignment,&line,&tail)) ) { current_line++; if ((NO_WARNING != value) || found) { if ((found) || (current_line < line_to_drop)) fprintf(f_out, "%s", line); } else // assignment on the line, so we may need to print it.. { found =( (exponent == assignment.exponent) && (bit_min == assignment.bit_min) && (bit_max == assignment.bit_max) ); if (!found) { fprintf(f_out,"%s",line); } else // we have the assignment... { if ((bit_min_new > bit_min) && (bit_min_new < bit_max)) { fprintf(f_out,"Factor=" ); if (strlen(assignment.assignment_key) != 0) fprintf(f_out,"%s,", assignment.assignment_key); fprintf(f_out,"%u,%u,%u",exponent, bit_min_new, bit_max); if (tail != NULL) fprintf(f_out,"%s",tail); } } } } // while..... fclose(f_in); fclose(f_out); if (!found) return ASSIGNMENT_NOT_FOUND; if(remove(filename) != 0) return CANT_RENAME; if(rename("__worktodo__.tmp", filename) != 0) return CANT_RENAME; return OK; }[/code] |
[QUOTE=Dubslow;301314]The signal() code is identical to previous versions of CUDALucas except for the string printed.[/QUOTE]
You may want to do some research on the sigaction() function (although there is no direct Windows equivalent). signal() is a portability nightmare. |
[QUOTE=chalsall;301318]You may want to do some research on the sigaction() function (although there is no direct Windows equivalent). signal() is a portability nightmare.[/QUOTE]
:shrug: It's what msft has been using, and no one's complained so far. (I'm guessing it's been in there a long time, since at least 1.2 in all likelihood.) |
[QUOTE=Dubslow;301319]:shrug: It's what msft has been using, and no one's complained so far. (I'm guessing it's been in there a long time, since at least 1.2 in all likelihood.)[/QUOTE]
:shrug: Just because it's being used doesn't necessarily mean it's correct in every environment. Once a month bug anyone? :wink: Quoting the man page for signal(2): [QUOTE]The only portable use of signal() is to set a signal’s disposition to SIG_DFL or SIG_IGN. The semantics when using signal() to establish a signal handler vary across systems (and POSIX.1 explicitly permits this variation); [B]do not use it for this purpose[/B].[/QUOTE] |
[QUOTE=Dubslow;301292]I didn't change a thing math-wise. Perhaps you chose a better FFT length? Edit: Could be new drivers, the 680 is still rather new. Or, if previously you were using arch>=2.0 but are now using arch=1.3, you might see some performance gains.
PS: Are you in Windows, and if so, does it crash for you like above?[/QUOTE] i used the compiled versions for windows by flashjh, without the worktodo-file edit: with worktodo.txt it will crash |
[QUOTE=Redarm;301334]i used the compiled versions for windows by flashjh, without the worktodo-file
edit: with worktodo.txt it will crash[/QUOTE] Okay, well if you read the intermediate posts, that bug has been fixed and now we're just waiting for flash to recompile. :smile: |
Found my issue. In v2.02 you have:
[CODE]enum ASSIGNMENT_ERRORS clear_assignment(char *filename, int exponent) { <snip> [COLOR=Blue]#ifdef linux [/COLOR] f_in = fopen(filename, "r"); if (NULL == f_in) return CANT_OPEN_WORKFILE; f_out = fopen("__worktodo__.tmp", "w"); if (NULL == f_out) { fclose(f_in); return CANT_OPEN_TEMPFILE; } [COLOR=Blue]#else[/COLOR] errno_t err; err = fopen_s(&f_in, filename, "r"); if (err) return CANT_OPEN_WORKFILE; err=0; err = fopen_s(&f_out, "[COLOR=Red][B]__worktodo.tmp[/B][/COLOR]", "w"); if (err) { fclose(f_in); return CANT_OPEN_TEMPFILE; } [COLOR=Blue]#endif[/COLOR] [/CODE]This seems to be fixed in v2.03, I saw you made a new "_fopen()" and moved the OS-specific inside. Waiting for binaries of v2.03 (don't want to complicate the things even more by attempting a compilation by myself, I let flash do it, he has more experience :P). Meantime 2.02 produced first correct residues, two DCs (all switches in command line, no ini used). |
1 Attachment(s)
[QUOTE=LaurV;301340]Found my issue. In v2.02 you have:
[CODE]enum ASSIGNMENT_ERRORS clear_assignment(char *filename, int exponent) { <snip> [COLOR=Blue]#ifdef linux [/COLOR] f_in = fopen(filename, "r"); if (NULL == f_in) return CANT_OPEN_WORKFILE; f_out = fopen("__worktodo__.tmp", "w"); if (NULL == f_out) { fclose(f_in); return CANT_OPEN_TEMPFILE; } [COLOR=Blue]#else[/COLOR] errno_t err; err = fopen_s(&f_in, filename, "r"); if (err) return CANT_OPEN_WORKFILE; err=0; err = fopen_s(&f_out, "[COLOR=Red][B]__worktodo.tmp[/B][/COLOR]", "w"); if (err) { fclose(f_in); return CANT_OPEN_TEMPFILE; } [COLOR=Blue]#endif[/COLOR] [/CODE]This seems to be fixed in v2.03, I saw you made a new "_fopen()" and moved the OS-specific inside. Waiting for binaries of v2.03 (don't want to complicate the things even more by attempting a compilation by myself, I let flash do it, he has more experience :P). Meantime 2.02 produced first correct residues, two DCs (all switches in command line, no ini used).[/QUOTE] Ah. In that case, it was [i]probably[/i] just a transcription error on flash's part when he inserted the fopen_s. When I redefined all that crap under _fopen, I deleted all the Windows #ifdefs and used the Linux ones, and apparently I deleted the error as well. :smile: By the way throughout this mess I think I've had 3 for 3 with my hacks. By the other way, was I correct about you wanting print_time_from_seconds to always pad the ETA with zeros to a constant length? PS Try hitting the ^C yourself. :smile: ___________________________________________________________________ PPS [QUOTE=LaurV;301348]No idea what that means till I will see it in action :blush: The "err=" and "ms/iter" is what I was thinking of. The ETA looks good as it is in v2.02. If you pad it with zeroes or better spaces up to 3:2:2 digits (in case is shorter) it would be perfect. But this is just nitpicking...[/QUOTE] [QUOTE=LaurV;301303] edit2: next cosmetic thing would be to use a fixed format (6, 5, even 4 decimals would be enough) for error display, and 4 decimals for [U]ETA[/U]. [/QUOTE] The ms/iter was already at a steady field width; like I said in my original response, err is also now at a fixed field width. ETA would look like 15 hrs: "15:42:24" 5 hrs: "05:42:24" 0 hrs 42 mns: "00:42:24" 7 mins "00:07:24" 24 seconds "00:00:24" 4 seconds "00:00:04" I personally think 3:2:2 is a bit excessive, since the vast majority of users do DCs. ___________________________________________________________________ PPPS [QUOTE=LaurV;301349]Remember I am still running v2.02 :P Most probably it will say "caught. Bla Bla" without "^C". I don't want to interrupt the work right now only for that. edit: could not resist... I was right. Maybe you have done some changes in v2.03, but they didn't reach me yet...[/QUOTE] Hmm... it should be "^C caught. ..." because when you hit ^C yourself, it appears on the terminal, then " caught..." makes it look like one print. (And no, I haven't changed that part since I initially made 2.00a.) ___________________________________________________________________ PPPPS [QUOTE=LaurV;301348]ETA is already 2:2:2, so if you think 3:2:2 is too much, then there is nothing to do here. For me it looks ok either way.[/QUOTE] Well if it gets below an hour, the hour field isn't printed at all and we get a 2:2, not a 2:2:2. That's what I was wondering about. [code]//From apsen void print_time_from_seconds (int sec) { if [U](sec > 3600)[/U] { printf ([U]"%d", sec / 3600[/U]); sec %= 3600; printf (":%02d", sec / 60); } [U]else printf ("%d", sec / 60);[/U] sec %= 60; printf (":%02d", sec); }[/code] Reattaching 2.03 from previous page. |
[QUOTE=Dubslow;301344]
By the other way, was I correct about you wanting print_time_from_seconds to always pad the ETA with zeros to a constant length?[/QUOTE] No idea what that means till I will see it in action :blush: The "err=" and "ms/iter" is what I was thinking of. The ETA looks good as it is in v2.02. If you pad it with zeroes or better spaces up to 3:2:2 digits (in case is shorter) it would be perfect. But this is just nitpicking... Thanks for the PM, my post count is inflated too, but who cares :D, I saw your reply, I wrongly wrote ETA there. I was thinking to ms/iter with 4 decimals, fixed. ETA is already 2:2:2, so if you think 3:2:2 is too much, then there is nothing to do here. For me it looks ok either way. |
[QUOTE=Dubslow;301344]PS Try hitting the ^C yourself. :smile:[/QUOTE]
Remember I am still running v2.02 :P Most probably it will say "caught. Bla Bla" without "^C". I don't want to interrupt the work right now only for that. edit: could not resist... I was right. Maybe you have done some changes in v2.03, but they didn't reach me yet... |
CUDALucas 2.03 x64 Binaries
1 Attachment(s)
Attached CUDALucas 2.03 x64 binaries - Tested
- I was able to run with or without .ini file - Worktodo.txt works fine. - Command line still takes precedence - Test=XXXXXXXX in worktodo.txt works fine - Test=N/A,XXXXXXXX,XX,X works fine - Test=AID,XXXXXXXX,XX,X works fine - I tested DoubleCheck with all these also, they work fine @Dubslow - everything compiled straight out, but the makefile.win needed a small change OUT = NAME -> OUT = $(NAME) on line 6 and I still need /Tp for now This is CUDA 4.0 | sm20 & sm_21 (see next posts for CUDA 3.2 & source) Edit: With .ini and worktodo.txt in directory, I can run just CUDALucas.exe and it works fine. BTW - Thanks for all the hard work everyone! Edit2: I just edited my worktodo.txt with M86243 and restarted with CUDALucas.exe: [CODE] M( 86243 )P, n = 4608, CUDALucas v2.03 Continuing work from a partial result of M26105XXX fft length = 1572864 iteration = 13559297 [/CODE] It successfully found the Prime, cleared the exponent from worktodo.txt and continued with my current exponent - awesome! |
CUDALucas 2.03 x64 Binaries
1 Attachment(s)
CUDALucas 2.03 x64 CUDA 3.2 | sm_13
|
CUDALucas 2.03 x64 Source
1 Attachment(s)
CUDALucas 2.03 x64 Source with updated makefile.win
|
CUDALucas 2.03 x64 Binaries
1 Attachment(s)
CUDALucas 2.03 x64
This is for GTX 680 with CUDA 4.2 | sm_30. If someone has a 680, can you test this with other versions and let me know how it works? Thanks |
AFAICT, there is no diffrence in the math section between 2.01 and 2.03 (I'm on Linux)
Am I correct? Luigi :smile: |
[QUOTE=flashjh;301358]CUDALucas 2.03 x64
This is for GTX 680 with CUDA 4.2 | sm_30. If someone has a 680, can you test this with other versions and let me know how it works? Thanks[/QUOTE] [CODE] Continuing work from a partial result of M65xxxxxx fft length = 3670016 iteration = 280001 Iteration 290000 M( 65434657 )C, 0x32166eaa07042500, n = 3670016, CUDALucas v2.03 err = 0.1731 (1:07 real, 6.7 623 ms/iter, ETA 122:21:37) [/CODE] |
[QUOTE=ET_;301359]AFAICT, there is no diffrence in the math section between 2.01 and 2.03 (I'm on Linux)
Am I correct? Luigi :smile:[/QUOTE] Yep :smile: I'm so glad this works now. |
[QUOTE=flashjh;301353] I still need /Tp for now[/QUOTE]
For this you have a choice. Either keep the /Tp in (no harm I can see) or you need to declare every function prototype called from cu files as extern "C" blah(). I think what's going on is this. .cu files end up being compiled as C++ files on Windows. Thus, when it sees a prototype it expects it to use C++ linkage & name mangling. But when you build .c files, MSVC assumes it's just regular C so the mangling and so on doesn't happen. That's why the missing function names at link time look like C++ mangled names - that's what they're being translated to in the .cu file. But the short version is I don't see any harm in just building everything as C++ code using /Tp |
[QUOTE=Dubslow;301371]Yep :smile:
I'm so glad this works now.[/QUOTE] Have anyone created a worktodo file with more then two exponents (lines)? Something fake, which will finish very fast, like: [CODE] Test=N/A,150001,69,1 Test=N/A,150003,69,1 Test=N/A,150007,69,1 Test=N/A,150009,69,1 [/CODE]or simpler [CODE] Test=150001 Test=150003 Test=150007 Test=150009 [/CODE]Then run cl_2.03 on it? After SECOND exponent finishes (17-20 seconds on gtx580), the file will look like [CODE] Test=150003Test=150007Test=150009 [/CODE]And a parsing error is issued. Something close to: [CODE]Warning: ignoring line 1: "Test=N/A,150003,69,1Test=N/A,150007,69,1Test=N/A,150009,69,1" in "worktodo_1.txt". Reason: invalid format. No valid assignment found.[/CODE]It seems like the CR/LF characters are lost... Looking summarily into the source of parse.c, around the lines 473, 480, it does not seem that variable called "line" contains the "\n" character. One "\n" should be printed in "%s[U][B]\n[/B][/U]" format specifier... Or not?** edit: and by the way, another 2 good residues with v2.02, making the score 4 to 0. [code]Processing result: M( 26299261 )C, 0x7a7f02229ab66f20, n = 1474560, CUDALucas v2.02 LL test successfully completes double-check of M26299261 Processing result: M( 26331001 )C, 0xfc7b1d44c505c281, n = 1474560, CUDALucas v2.02 LL test successfully completes double-check of M26331001 [/code] I switched to 2.03 in this evening (half hour ago, another two expos ETA ~18h). --------------- ***(edit 2: I saw the ini file only has LF at the end of the lines, it occurs to me that \n in windows is in fact \r\n, that is a 0x0D followed by a 0x0A, or CR followed by LF. In linux seems like a LF is enough. If I deleted all 0x0A characters from the worktodo, using programmer's notepad (PN2), it then worked ok. This could be a real problem, as all worktodo files generated by gpu272 or primenet have CR+LF as eol terminators, and I don't see myself doing manual replacement all the time. PN2 and Eclipse have autoreplace, "set eol as LF only", but still need open the file and then save the file by hand... Now this made me curious how mfaktc deal with this, because there I never had any problem of such) |
Oh, the usual CR/LF/\r\n thing. (Note that Mac's end-of-line is simply \r.)
A portable code(r) needs to know these things. It's a good thing that this code is not widely used. ::razz:: Makes it a good candidate for sandbox coding. ::razz again:: Seriously though! This project seems to have outgrown the ad hoc posting-and-tons-of-spaggetti-code stage. Pick a respectable admin (possibly the original author), start a git/svn repository, start real life tracking of bugs, rollbacks, code reviews. Just a suggestion. Of course you can continue doing what you are doing. |
M( 26200729 )C, 0x02c4c8ac57869fae, n = 1572864, CUDALucas v2.01
M( 26227489 )C, 0x8e52cc33256a5bca, n = 1572864, CUDALucas v2.01 Both confirmed. Luigi |
Updated 2.03 x64 binaries
1 Attachment(s)
[QUOTE=kjaget;301432]For this you have a choice. Either keep the /Tp in (no harm I can see) or you need to declare every function prototype called from cu files as extern "C" blah().
I think what's going on is this. .cu files end up being compiled as C++ files on Windows. Thus, when it sees a prototype it expects it to use C++ linkage & name mangling. But when you build .c files, MSVC assumes it's just regular C so the mangling and so on doesn't happen. That's why the missing function names at link time look like C++ mangled names - that's what they're being translated to in the .cu file. But the short version is I don't see any harm in just building everything as C++ code using /Tp[/QUOTE] I plan to fix it when I get more time [QUOTE=LaurV;301449]Have anyone created a worktodo file with more then two exponents (lines)? Something fake, which will finish very fast, like: [CODE] Test=N/A,150001,69,1 Test=N/A,150003,69,1 Test=N/A,150007,69,1 Test=N/A,150009,69,1 [/CODE]or simpler [CODE] Test=150001 Test=150003 Test=150007 Test=150009 [/CODE]Then run cl_2.03 on it? After SECOND exponent finishes (17-20 seconds on gtx580), the file will look like [CODE] Test=150003Test=150007Test=150009 [/CODE]And a parsing error is issued. Something close to: [CODE]Warning: ignoring line 1: "Test=N/A,150003,69,1Test=N/A,150007,69,1Test=N/A,150009,69,1" in "worktodo_1.txt". Reason: invalid format. No valid assignment found.[/CODE]It seems like the CR/LF characters are lost... Looking summarily into the source of parse.c, around the lines 473, 480, it does not seem that variable called "line" contains the "\n" character. One "\n" should be printed in "%s[U][B]\n[/B][/U]" format specifier... Or not?**[/QUOTE] You are correct. The \n is needed for Windows, but Linux will compile it correctly also. In Linux the file will contain a single LF to indicate the newline; under windows, it will contain a CRLF. Attached updated 2.03 to correct error - tested, but please test again ;) [QUOTE=Batalov;301458]... Seriously though! This project seems to have outgrown the ad hoc posting-and-tons-of-spaggetti-code stage. Pick a respectable admin (possibly the original author), start a git/svn repository, start real life tracking of bugs, rollbacks, code reviews. Just a suggestion. Of course you can continue doing what you are doing.[/QUOTE] I agree. msft, dubslow, myself are all possible. I plan to be around for a while. @Dubslow and msft, any ideas? Attached file is .zip. Inside are all source files (with updated parse.c) and all CUDA builds. I used .7z ([URL="http://www.7-zip.org/download.html"]7-zip[/URL]) because it compresses a lot better for the binaries. In the future, I'll use that if the files are too big (or until we move to a repository). Included CUDALucas x64 binaries: CUDA 3.2 | sm_13 CUDA 4.0 | sm_20 CUDA 4.0 | sm_21 CUDA 4.1 | sm_21 CUDA 4.2 | sm_30 |
Actually, '\n' as a format specifier guarantees that the proper line endings will be filled in by the compiler. gcc will literally do just '\n', whereas MSVC will use a literal '\r\n' whenever a '\n' is specified in print statements. The problem is, when you parse_worktodo_value() in line 432, one of the arguments is a pointer to a copy of the line; however, I changed that copy going from 2.02 to 2.03 to [i]not[/i] include the '\n' because I wanted to be able to print the line-copy in warning messages without the newline, e.g. line 365 [code]ignoring line %u: \"%s\" in \"%s\".[/code]
With a newline in the string, that message would look bad, so I removed it (lines 206-207) without realizing it was used in clear_assignment(). The solution is in fact to add a '\n' as LaurV suggested, and the compilers will take care of any platform specific line endings. (Furthermore, when a line is read in that ends with '\r\n', it is automatically converted to simply '\n' in memory, so any and all code only needs to bother with '\n's. Also Batalov, hasn't Apple switched from the '\r' a while ago? Edit: The Wiki article below says they switched after Mac OS 9.) flash's fix is correct, however, considering my n00bosity, a SourceForge would be a good idea. msft, do you have any particular cares about setting one up? [url]http://en.wikipedia.org/wiki/Newline[/url] [quote=Wikipedia/Newline] The C programming language provides the escape sequences '\n' (newline) and '\r' (carriage return). However, these are not required to be equivalent to the ASCII LF and CR control characters. The C standard only guarantees two things: 2. When writing a file in text mode, '\n' is transparently translated to the native newline sequence used by the system, which may be longer than one character. When reading in text mode, the native newline sequence is translated back to '\n'. In binary mode, no translation is performed, and the internal representation produced by '\n' is output directly.[/quote] Edit: I forgot the other big thread there :razz: Regarding the C/C++ thing, kjaget is exactly right, and that's why #ifdef linux then all the functions are declared that way; the url in the comments of that section has basically the same explanation. I didn't realize that it only worked on Windows because it thought it was C++. Are there any dangers in our code where a C++ compiler would miscompile valid C? I know they're not [i]entirely[/i] compatible, and that some incompatibilities exist, and therefore we should watch out for them. |
[QUOTE=Dubslow;301469]Regarding the C/C++ thing, kjaget is exactly right, and that's why #ifdef linux then all the functions are declared that way; the url in the comments of that section has basically the same explanation. I didn't realize that it only worked on Windows because it thought it was C++. Are there any dangers in our code where a C++ compiler would miscompile valid C? I know they're not [i]entirely[/i] compatible, and that some incompatibilities exist, and therefore we should watch out for them.[/QUOTE]
I agree and that's why I plan to fix it. |
[QUOTE=flashjh;301471]I agree and that's why I plan to fix it.[/QUOTE]
In that case, the simplest method would be to remove the function declarations from parse.h, and then remove the #ifs from around the 'extern "C"'s already there. I'll set up a SourceForge :razz: (It shouldn't change the binaries.) |
Does anyone know what license CUDALucas is under? I couldn't find any licensing on MacLucasFFTW.c, MacLucasUNIX.c or lucdwt.c, though the most latter has an "All Rights Reserved" in it. (In the process of looking through these, I also discovered that "George Woltman's lightning-fast Pentium program has its roots in this code [lucdwt.c]". :smile:) I also didn't see anywhere that msft had put a particular license on it.
Strictly speaking, due to the mfaktc code I added, v2.02+ should be under the GPL, though I doubt TheJudger/Christenson would mind if it were under some other license. (Edit: Actually because of timeval.c even versions before 2.02 should be GPL. I'm not sure when timeval.c was added.) Edit: About C/C++ and headers, I was able to move the 'extern "C" ..." declarations to parse.h, and this time I was able to link. (I thought I had tried this before, but I guess not.) There are no longer and #ifdefs in that regard. (I'll see about posting this on SourceForge soon.) |
After much fighting with SF Beta and SVN, there is now a CUDALucas SourceForge [URL="https://sourceforge.net/projects/cudalucas/"]page[/URL]. :smile:
In addition, timeval.c is gone entirely, not to mention various changes to the defines and includes. Any binaries compiled should be identical, but flash should also test and make sure this newer stuff does actually compile. Does anybody feel like writing a README? (I've marked it as GPL, but that's certainly open to discussion.) msft and flash, please report your SF usernames so I can add you. Anyone else is welcome to join as a "Member". (Or Developer if you ask nicely. :smile:) |
Well. I am nitpicking again. (I learned this word here on this forum :P)
Now the toy seems to work as expected. There is still a small BIG issue, which affects all users with more then one card and/or the users who want to run multiple copies of the software. Clearly they can't share the same worktodo, many conflicts could arise form it, the third world war, etc. when CL would write in the file with both instances. We can't specify two different files in the config, as there is only one config, and this raise the question how the instances of CL would discern which config is whose. The only solution is to do it mfaktc's way: put them in separate folders. But now we have a different headache, we have to surf through all those folders to find the result files to report them. God helps that is only "one time per day" output, and we have only 2 or 3 (max 4) folders like this, but what should we do if we have 20 gtx9980 cards (zefler architecture) and each of them runs 5 CL instances and each instance outputs one DC every 5 minutes... dream on!... Well, short story: wouln't be nice if we can customize the output folder from the ini file? and use "ResultsFile=..\result", so all instances add their result lines to the same file, in the parent folder, where all folders are... Wait for open it, if busy (opened for write by another process), and we solved the access issue too.. and keep all results in one place... For mfaktc I solve this with batches, anyhow I should create some batches for CL, just in case :razz: [CODE]copy /b allresults.txt+cl0\result.txt del cl0\results.txt copy /b allresults.txt+cl1\result.txt del cl1\results.txt copy /b allresults.txt+cl2\result.txt del cl2\results.txt etc [/CODE]and launch it from time to time... Another nitpicking: put back the "CTRL+C" in the string. It will look better with "^C CTRL+C detected. Bla bla" (on your computer where ^c is printed when is caught) then "caught. Bla bla", (in mine, where is not). I have the feeling I did something wrong and he caught me... And anyhow, some people don't have any idea what ^C is. Something raised at the power of a complex set? (nerd!!):smile: And a third nitpicking: Write the times somewhere, on the screen, on the result file, whatever. If it is complicate to compute them, it could be a plain separate line: "Time=21:30:17.125" and write the wall clock, it is not really necessary to write "this test took hh:mm:ss.lll" as mfaktc is doing. Just printing the wall clock at the beginning and at the end of each expo would be enough to have one idea how much the task REALLY took. The ETA is just an estimation, and not always accurate, he can't know how busy my computer will be tonight if my boss gave me homework, or if I am going to play puzzlepirates or to trade forex. Didn't say nothing about watching sexy movies with Mrs LaurV in the same time... |
[QUOTE=LaurV;301515]Well. I am nitpicking again. (I learned this word here on this forum :P)
Now the toy seems to work as expected. There is still a small BIG issue, which affects all users with more then one card and/or the users who want to run multiple copies of the software. Clearly they can't share the same worktodo, many conflicts could arise form it, the third world war, etc. when CL would write in the file with both instances. We can't specify two different files in the config, as there is only one config, and this raise the question how the instances of CL would discern which config is whose. The only solution is to do it mfaktc's way: put them in separate folders. But now we have a different headache, we have to surf through all those folders to find the result files to report them. God helps that is only "one time per day" output, and we have only 2 or 3 (max 4) folders like this, but what should we do if we have 20 gtx9980 cards (zefler architecture) and each of them runs 5 CL instances and each instance outputs one DC every 5 minutes... dream on!... Well, short story: wouln't be nice if we can customize the output folder from the ini file? and use "ResultsFile=..\result", so all instances add their result lines to the same file, in the parent folder, where all folders are... Wait for open it, if busy (opened for write by another process), and we solved the access issue too.. and keep all results in one place... For mfaktc I solve this with batches, anyhow I should create some batches for CL, just in case :razz: [CODE]copy /b allresults.txt+cl0\result.txt del cl0\results.txt copy /b allresults.txt+cl1\result.txt del cl1\results.txt copy /b allresults.txt+cl2\result.txt del cl2\results.txt etc [/CODE]and launch it from time to time...[/quote] The idea that occurred to me turned out to be the same as the idea that occurred to Bdot, so I'll move PrintDeviceInfo off of '-i' and use that option to specify an ini file instead; ResultsFile will be a new option. (Does anyone need a command line switch for that as well?) [QUOTE=LaurV;301515] Another nitpicking: put back the "CTRL+C" in the string. It will look better with "^C CTRL+C detected. Bla bla" (on your computer where ^c is printed when is caught) then "caught. Bla bla", (in mine, where is not). I have the feeling I did something wrong and he caught me... And anyhow, some people don't have any idea what ^C is. Something raised at the power of a complex set? (nerd!!):smile:[/quote]What I mean is that when I press CTRL+C, those two characters '^C' literally appear on my screen, just like 'a' appears when I press the a key. Regardless though, I'll change the message. [QUOTE=LaurV;301515] And a third nitpicking: Write the times somewhere, on the screen, on the result file, whatever. If it is complicate to compute them, it could be a plain separate line: "Time=21:30:17.125" and write the wall clock, it is not really necessary to write "this test took hh:mm:ss.lll" as mfaktc is doing. Just printing the wall clock at the beginning and at the end of each expo would be enough to have one idea how much the task REALLY took. The ETA is just an estimation, and not always accurate, he can't know how busy my computer will be tonight if my boss gave me homework, or if I am going to play puzzlepirates or to trade forex. Didn't say nothing about watching sexy movies with Mrs LaurV in the same time...[/QUOTE] This would be harder; I think I'll eventually incorporate Bdot's additions to mfakto 0.11 to allow customizable checkpoint lines. Changes won't be immediate. PS Can somebody post any and all .dll's they use to run CUDALucas? Also, flash, we should really pick one or two archs/CUDA versions. You can compile any arch against any version that supports it -- there's not much difference between (for example) 3.2|1.3 and 4.0|1.3, etc. mfaktc 0.18 as far as I can tell was only compiled against CUDA 4.0 libs; it's probably best if we just use the latest CUDA version. Apparently, judging from mfaktc's make files, you can compile code for more than one arch/compatibility into one executable; like I've said before though, msft's makefile only used arch=1.3. I'll look into nvcc options. |
[QUOTE=Dubslow;301532]This would be harder; I think I'll eventually incorporate Bdot's additions to mfakto 0.11 to allow customizable checkpoint lines. Changes won't be immediate.[/QUOTE]
I wasn't talking about something so complicate. Checkpoint lines are perfect as they are. I'd just like to have printed a line "this test took xxx hours (from the last restart)." when a test is finished. This could be easy, read the clock when you start an exponent (or resume the program, it does not really matter) read it again when you finish, print the difference when you print the final "M( 859433 )P, n = 49152, CUDALucas v2.02" line. Print it on the screen and in the result file too. That is all. edit: About the different exe files, for my 580 the 40_20 is fastest. Double speed compared with 32_13. For other cards, people may need different versions. I remember the v1.3 used to be faster when compiled with 32_13, about 25% faster then 40_20. Also, 41_ was always slower for my card, but people having sm_21 may say differently. And to end in a positive note: two residues matched with 2.03 (started last night, interrupted today and replaced 2.03-no-LF with 2.03-with-LF, resume. The result: [CODE] Processing result: M( 26306983 )C, 0xfd9d34cf4aa8db5d, n = 1474560, CUDALucas v2.03 LL test successfully completes double-check of M26306983 CPU credit is 26.1608 GHz-days. Processing result: M( 26331029 )C, 0x06226c8e8e381896, n = 1474560, CUDALucas v2.03 LL test successfully completes double-check of M26331029 CPU credit is 26.1847 GHz-days. [/CODE] |
[QUOTE=LaurV;301542]I wasn't talking about something so complicate. Checkpoint lines are perfect as they are. I'd just like to have printed a line "this test took xxx hours (from the last restart)." when a test is finished. This could be easy, read the clock when you start an exponent (or resume the program, it does not really matter) read it again when you finish, print the difference when you print the final "M( 859433 )P, n = 49152, CUDALucas v2.02" line. Print it on the screen and in the result file too. That is all.
[/QUOTE] Misunderstood that, you're right, this should be simpler to do. I'll postpone the customizable-line thing until somebody asks for it. |
One accidental side effect of having a SourceForge is that there's more publicity for it. 7 downloads in two days [url]https://sourceforge.net/projects/cudalucas/files/stats/timeline[/url] , though I'm pretty sure at least the Thailand download was LaurV :smile:. Is anybody on here from Austria or Trinidad and Tobago (or use such a proxy)?
|
It was not me, I visited the page but only could see the ini file, so I did not dld anything.
edit: visited now again, some files appeared. Still could not see any source codes, and the libs are only for linux. If ye are going to put libs (not really necessary, but if) then better put all of them, windoze included. Or are they inside of (each) (packed self extract?) exe file? (I did not dld to check). |
I' ve asked flash about the windows libs. As for code, it's one of the other tabs besides "Files".
|
[QUOTE=Dubslow;301626]I' ve asked flash about the windows libs. As for code, it's one of the other tabs besides "Files".[/QUOTE]
Ok, saw it now. I did the best thing I could do, gave it one recommendation. :tu: I have all the windows libraries except cudart42 (no gtx680 yet on my fingers) in my comp at home and I can give you all in case flash does not appear till I reach home tonight (I still have to work for about 5-6 hours, now I am in the lunch break). |
Just zip them up and attach them here, please. :smile:
|
[QUOTE=Dubslow;301631]Just zip them up and attach them here, please. :smile:[/QUOTE]
For mfaktc I need some .dll files in the directory, but for CUDALucas I don't; which .dll files are you looking for? I can post whatever you need also ;) |
[QUOTE=flashjh;301649]For mfaktc I need some .dll files in the directory, but for CUDALucas I don't; which .dll files are you looking for? I can post whatever you need also ;)[/QUOTE]
Umm... you should need cudart.dll and cufft.dll. They are definitely necessary, so they're somewhere on your computer. |
Okay, posted a README (outline stolen from mfaktc (again :smile:)) and I'd like a sanity/grammar check on it, especially section 3. It's visible in-browser [URL="https://sourceforge.net/projects/cudalucas/files/"]here[/URL] (look below the list of files). Edit: It's written for someone who's just browsing SourceForge and doesn't know the first thing about GIMPS. They should be able to read it and operate CUDALucas without any sort of problems.
PS I'm still looking for Windows .dlls, LaurV? :razz: |
[QUOTE=Dubslow;301829]PS I'm still looking for Windows .dlls, LaurV? :razz:[/QUOTE]
Well, Saturday evening is shopping... otherwise nothing to eat next week and I would die by starvation... However, you are not so lucky to get rid of me so fast. I packed all cudartXX_XX_XX.dll I ever used and could find in my computers, there is a cufftXX_XX_XX.dll correspondent for each of them, but those are huge (>20Megs each) and I don't know how to attach them, or where to drop them. Newest versions can be dld from nVidia site. Does anyone still use older versions? Where should I put them so you can take them? |
Um, just 3.2.x, 4.0.x, and 4.1.x, one example of each, should be sufficient. If you put them all in a zip, that [i]should[/i] be attachable? (Otherwise try e.g. [url]www.filesmelt.com[/url])
|
There is no way to attach them here, compressed or not, due to 240k limit. Each cudart is about 600k, and each cufft is about 25-30 megs.
Try [URL]http://filesmelt.com/dl/cudarts.zip[/URL] It should contain: cudart32_32_16.dll cudart32_40_17.dll cudart32_41_28.dll cudart64_32_16.dll cudart64_41_28.dll If you can take them I will put some cufft too. I could only find cufft64, which work with bots win7 and vista 64 bits. cufft64_32_16.dll cufft64_41_28.dll This should be enough. 41_28 works wherever 40_17 works and for my gtx580 is a bit faster. Some releases of mfaktc may contain the rest of them, if one needs other. edit: And to be ontopic: another 3 DC matches with v2.03, and one mismatch (26114747). I am running the CL-TC for mismatch. Score is 5 to 1. |
Thanks :smile: I don't think anybody actually uses 32 bit.
PS flash, be prepared for a test/data-collection version coming soon. (I really just need you to compile it.) I'll need data, lots of data, hopefully for a variety of cards. |
1 Attachment(s)
I attach here a sfv file generated by total commander (renamed .txt to be able to attach it). I am a bit afraid of using those file-drop sites. They can mess with the dlls and exes very easy.
|
[QUOTE=LaurV;301850]I attach here a sfv file generated by total commander (renamed .txt to be able to attach it). I am a bit afraid of using those file-drop sites. They can mess with the dlls and exes very easy.[/QUOTE]
What's sfv? And that particular drop site was made by one of my high school friends :smile: (The same one who got me a 2600K for $150 :lol:) Edit: Yes please, can I have the cufft's as well? |
1 Attachment(s)
[QUOTE=Dubslow;301858]What's sfv?[/QUOTE]
Google is your [URL="http://en.wikipedia.org/wiki/Simple_file_verification"]friend[/URL] :D (in this case wikipedia) edit: cuffts are on the way. [strike] What's the drop limit? (zip file is 33 megs and I wonder if I should compress it harder or split it in two. I tried already two times and yer friend's site says I/O error and crashes at about half. Is it my net - as usually - bad Sunday morning or the site has a limit of 15 megs?)[/strike] edit 2: It went through. Here [URL="http://filesmelt.com/dl/cuffts.zip"]the link[/URL]. The .sfv is attached to this post. BTW, yer linux should have this built-in by default, no need any tools like total commander which I am using in windoze. Ye should know it... WinSFV is very convenient, just doubleclick on any sfv file and it tells you if any of the files checksumed inside was changed. Of course you may need to rename it back (delete the .txt extension which I added for forum reasons) |
Theoretically it should handle it, but with files that large you never know. Email it to me.
Edit: Wow, I'm amazed. I managed to DL the file in 2 seconds flat. FS must have a server nearby to me. (Edit2: No, it wasn't installed by default. :razz: MD5 is much more common. Addendum: [quote='man cksfv']cksfv is a tool for verifying CRC32 checksums of files. CRC32 checksums are used to verify that files are not corrupted. The algorithm is cryp‐ tographically crippled so it can not be used for security purposes. md5sum (1) or sha1sum (1) are much better tools for checksuming files. cksfv should only be used for compatibility with other systems.[/quote]) |
LaurV, for printing total time, does that need to appear in the results file, or just on the screen?
Edit: I've found a bug in 2.03, however it will only manifest itself if you screw up the formatting of the SaveFolder, ResultsFile, or WorkFile options in the ini file. PS @LaurV: [code]bill@Gravemind:~/CUDALucas∰∂ ./new.CUDALucas 132049 -f 0 Warning: Couldn't parse ini file option ResultsFile; using default "results.txt" Starting M132049 fft length = 7168 Iteration 30000 M( 132049 )C, 0xbcd4392925c8b6c9, n = 7K, CUDALucas v2.04 Alpha err = 0.0103 (0:03 real, 0.0892 ms/iter, ETA 0:08) ^C SIGINT caught. Writing checkpoint. bill@Gravemind:~/CUDALucas∰∂ ./new.CUDALucas 132049 -f 0 Warning: Couldn't parse ini file option ResultsFile; using default "results.txt" Continuing work from a partial result of M132049 fft length = 7168 iteration = 40129 Iteration 60000 M( 132049 )C, 0x1a3c4b80c267f04f, n = 7K, CUDALucas v2.04 Alpha err = 0.0097 (0:02 real, 0.0578 ms/iter, ETA 0:03) Iteration 90000 M( 132049 )C, 0x28ecbb0541f5ec16, n = 7K, CUDALucas v2.04 Alpha err = 0.0097 (0:03 real, 0.0873 ms/iter, ETA 0:02) Iteration 120000 M( 132049 )C, 0x816902f6d3a9764a, n = 7K, CUDALucas v2.04 Alpha err = 0.0103 (0:02 real, 0.0872 ms/iter, ETA 0:00) M( 132049 )P, n = 7K, CUDALucas v2.04 Alpha. Estimated total time: 0:11 bill@Gravemind:~/CUDALucas∰∂ [/code] :grin: The best part is that it can still read the old checkpoints, realize it's reading old checkpoints and then not print the time. |
[QUOTE=Dubslow;301514]After much fighting with SF Beta and SVN, there is now a CUDALucas SourceForge [URL="https://sourceforge.net/projects/cudalucas/"]page[/URL]. :smile:
In addition, timeval.c is gone entirely, not to mention various changes to the defines and includes. Any binaries compiled should be identical, but flash should also test and make sure this newer stuff does actually compile. Does anybody feel like writing a README? (I've marked it as GPL, but that's certainly open to discussion.) msft and flash, please report your SF usernames so I can add you. Anyone else is welcome to join as a "Member". (Or Developer if you ask nicely. :smile:)[/QUOTE] Ok, SVN was a pretty big learning curve :smile: I updated the files per the comments to rev20 @Dusbslow, can you recompile and test in Linux? I changed the [SIZE=2]IniGetStr function and added a custom sprintf_s for MSVS that should only affect writing results.txt files, but I need you to test compile/run again.[/SIZE] [SIZE=2]Otherwise, everything seems to be working well. After you test compile/run, I'll recompile and post to SourceForge.[/SIZE] |
[QUOTE=flashjh;302037]Ok, SVN was a pretty big learning curve :smile:
[/quote]Yeah, me too. Check out the comments for r19 :razz: [QUOTE=flashjh;302037] I updated the files per the comments to rev20 @Dusbslow, can you recompile and test in Linux? I changed the [SIZE=2]IniGetStr function and added a custom sprintf_s for MSVS that should only affect writing results.txt files, but I need you to test compile/run again.[/SIZE] [SIZE=2]Otherwise, everything seems to be working well. After you test compile/run, I'll recompile and post to SourceForge.[/SIZE][/QUOTE] That was indeed the bug I was referring to. I'll test it, but the code looks good. (spritf?) I wasn't intending to post executables to SF until it moves to at least Beta. I'm only about halfway through the changes, not done yet :smile: (Among other things, results file locking isn't implemented yet.) If you want, feel free to post executables here, but I do warn everyone, it's still in Alpha. :razz: In particular, you can compile that "test" version I mentioned flash, use /DTEST as defined by the makefile rule "test". I was waiting to ask for you to make it until I had written the Python I need locally to interpret the results, but you can make it now if you want. :smile: |
Edit (I wish): Yes, it does compile fine.
In other news, I suddenly can't commit to svn/tags. [code]bill@Gravemind:~/CUDALucas∰∂ svn commit --username=dubslow tags/v2.03-final/ svn: Commit failed (details follow): svn: Server sent unexpected return value (403 Forbidden) in response to MKACTIVITY request for '/p/cudalucas/code/!svn/act/2c8fdab0-c8b5-425a-b904-ad76555a2b37' svn: Your commit message was left in a temporary file: svn: '/home/bill/CUDALucas/tags/svn-commit.tmp' bill@Gravemind:~/CUDALucas∰∂ svn commit --username dubslow tags/v2.03-final/ svn: Commit failed (details follow): svn: Server sent unexpected return value (403 Forbidden) in response to MKACTIVITY request for '/p/cudalucas/code/!svn/act/56dab32b-3b9e-4bfe-877f-0520584bae98' svn: Your commit message was left in a temporary file: svn: '/home/bill/CUDALucas/tags/svn-commit.2.tmp' bill@Gravemind:~/CUDALucas∰∂ [/code] It didn't even give me a chance to enter my password. :huh: |
[QUOTE=Dubslow;302040] I'm only about halfway through the changes, not done yet :smile:[/QUOTE]
I got off my butt and coded some of them. A silly bug took 20 minutes of my time, but the results are pretty. :grin: [code]bill@Gravemind:~/CUDALucas/test∰∂ cat worktodo.txt DoubleCheck=N/A,216091,24K,1,69 Test=12K,N/A,69,216091 Test=86243,4K DoubleCheck=CA9CAECD26710FC828DFBBB8________,26458577,69,1 bill@Gravemind:~/CUDALucas/test∰∂ ./new.CUDALucas Starting M216091 fft length = 24K Iteration 10000 M( 216091 )C, 0x30247786758b8792, n = 24K, CUDALucas v2.04 Alpha err = 0.0000 (0:02 real, 0.1734 ms/iter, ETA 0:34) Iteration 20000 M( 216091 )C, 0x13e968bf40fda4d7, n = 24K, CUDALucas v2.04 Alpha err = 0.0000 (0:02 real, 0.1752 ms/iter, ETA 0:33) Iteration 30000 M( 216091 )C, 0x540772c2abb7833a, n = 24K, CUDALucas v2.04 Alpha err = 0.0000 (0:02 real, 0.1708 ms/iter, ETA 0:30) Iteration 40000 M( 216091 )C, 0xc26da9695ac418c1, n = 24K, CUDALucas v2.04 Alpha err = 0.0000 (0:01 real, 0.1750 ms/iter, ETA 0:29) Iteration 50000 M( 216091 )C, 0x95ce3ff44abdd1e5, n = 24K, CUDALucas v2.04 Alpha err = 0.0000 (0:02 real, 0.1723 ms/iter, ETA 0:27) Iteration 60000 M( 216091 )C, 0x99aa87c495daffe7, n = 24K, CUDALucas v2.04 Alpha err = 0.0000 (0:02 real, 0.1713 ms/iter, ETA 0:25) Iteration 70000 M( 216091 )C, 0x505d249be3145893, n = 24K, CUDALucas v2.04 Alpha err = 0.0000 (0:02 real, 0.1716 ms/iter, ETA 0:24) Iteration 80000 M( 216091 )C, 0xddf612c72037b8a1, n = 24K, CUDALucas v2.04 Alpha err = 0.0000 (0:01 real, 0.1703 ms/iter, ETA 0:22) Iteration 90000 M( 216091 )C, 0xb5d8309a1ce9e2b6, n = 24K, CUDALucas v2.04 Alpha err = 0.0000 (0:02 real, 0.1747 ms/iter, ETA 0:20) Iteration 100000 M( 216091 )C, 0x4de7f101ee1cb7a5, n = 24K, CUDALucas v2.04 Alpha err = 0.0000 (0:02 real, 0.1702 ms/iter, ETA 0:18) Iteration 110000 M( 216091 )C, 0x10aa3286c0b03369, n = 24K, CUDALucas v2.04 Alpha err = 0.0000 (0:01 real, 0.1694 ms/iter, ETA 0:16) Iteration 120000 M( 216091 )C, 0x3981b56788b529e2, n = 24K, CUDALucas v2.04 Alpha err = 0.0000 (0:02 real, 0.1718 ms/iter, ETA 0:15) Iteration 130000 M( 216091 )C, 0x80438af231f8fccd, n = 24K, CUDALucas v2.04 Alpha err = 0.0000 (0:02 real, 0.1734 ms/iter, ETA 0:13) Iteration 140000 M( 216091 )C, 0x669382faea06df89, n = 24K, CUDALucas v2.04 Alpha err = 0.0000 (0:02 real, 0.1744 ms/iter, ETA 0:12) Iteration 150000 M( 216091 )C, 0x1b73cb121df7d6fa, n = 24K, CUDALucas v2.04 Alpha err = 0.0000 (0:01 real, 0.1715 ms/iter, ETA 0:10) Iteration 160000 M( 216091 )C, 0xb391010f29c70ee1, n = 24K, CUDALucas v2.04 Alpha err = 0.0000 (0:02 real, 0.1710 ms/iter, ETA 0:08) Iteration 170000 M( 216091 )C, 0x04055d84a77be1d8, n = 24K, CUDALucas v2.04 Alpha err = 0.0000 (0:02 real, 0.1709 ms/iter, ETA 0:06) Iteration 180000 M( 216091 )C, 0xe3d74c104f02967d, n = 24K, CUDALucas v2.04 Alpha err = 0.0000 (0:01 real, 0.1711 ms/iter, ETA 0:05) Iteration 190000 M( 216091 )C, 0x54b2a8b9cb149f9f, n = 24K, CUDALucas v2.04 Alpha err = 0.0000 (0:02 real, 0.1713 ms/iter, ETA 0:03) Iteration 200000 M( 216091 )C, 0xf433496947b7b103, n = 24K, CUDALucas v2.04 Alpha err = 0.0000 (0:02 real, 0.1708 ms/iter, ETA 0:01) Iteration 210000 M( 216091 )C, 0xcfe091c8f59f8a7b, n = 24K, CUDALucas v2.04 Alpha err = 0.0000 (0:02 real, 0.1700 ms/iter, ETA 0:00) M( 216091 )P, n = 24K, CUDALucas v2.04 Alpha. Estimated total time: 0:38 Starting M216091 fft length = 12K Iteration 10000 M( 216091 )C, 0x30247786758b8792, n = 12K, CUDALucas v2.04 Alpha err = 0.0045 (0:01 real, 0.1511 ms/iter, ETA 0:30) Iteration 20000 M( 216091 )C, 0x13e968bf40fda4d7, n = 12K, CUDALucas v2.04 Alpha err = 0.0045 (0:02 real, 0.1476 ms/iter, ETA 0:28) Iteration 30000 M( 216091 )C, 0x540772c2abb7833a, n = 12K, CUDALucas v2.04 Alpha err = 0.0045 (0:01 real, 0.1479 ms/iter, ETA 0:26) Iteration 40000 M( 216091 )C, 0xc26da9695ac418c1, n = 12K, CUDALucas v2.04 Alpha err = 0.0045 (0:02 real, 0.1494 ms/iter, ETA 0:25) Iteration 50000 M( 216091 )C, 0x95ce3ff44abdd1e5, n = 12K, CUDALucas v2.04 Alpha err = 0.0045 (0:01 real, 0.1464 ms/iter, ETA 0:23) Iteration 60000 M( 216091 )C, 0x99aa87c495daffe7, n = 12K, CUDALucas v2.04 Alpha err = 0.0045 (0:02 real, 0.1490 ms/iter, ETA 0:22) Iteration 70000 M( 216091 )C, 0x505d249be3145893, n = 12K, CUDALucas v2.04 Alpha err = 0.0045 (0:01 real, 0.1551 ms/iter, ETA 0:21) Iteration 80000 M( 216091 )C, 0xddf612c72037b8a1, n = 12K, CUDALucas v2.04 Alpha err = 0.0045 (0:02 real, 0.1506 ms/iter, ETA 0:19) Iteration 90000 M( 216091 )C, 0xb5d8309a1ce9e2b6, n = 12K, CUDALucas v2.04 Alpha err = 0.0045 (0:01 real, 0.1480 ms/iter, ETA 0:17) Iteration 100000 M( 216091 )C, 0x4de7f101ee1cb7a5, n = 12K, CUDALucas v2.04 Alpha err = 0.0045 (0:02 real, 0.1499 ms/iter, ETA 0:16) Iteration 110000 M( 216091 )C, 0x10aa3286c0b03369, n = 12K, CUDALucas v2.04 Alpha err = 0.0045 (0:01 real, 0.1463 ms/iter, ETA 0:14) Iteration 120000 M( 216091 )C, 0x3981b56788b529e2, n = 12K, CUDALucas v2.04 Alpha err = 0.0045 (0:02 real, 0.1481 ms/iter, ETA 0:13) Iteration 130000 M( 216091 )C, 0x80438af231f8fccd, n = 12K, CUDALucas v2.04 Alpha err = 0.0046 (0:01 real, 0.1458 ms/iter, ETA 0:11) Iteration 140000 M( 216091 )C, 0x669382faea06df89, n = 12K, CUDALucas v2.04 Alpha err = 0.0046 (0:01 real, 0.1460 ms/iter, ETA 0:10) Iteration 150000 M( 216091 )C, 0x1b73cb121df7d6fa, n = 12K, CUDALucas v2.04 Alpha err = 0.0046 (0:02 real, 0.1463 ms/iter, ETA 0:08) Iteration 160000 M( 216091 )C, 0xb391010f29c70ee1, n = 12K, CUDALucas v2.04 Alpha err = 0.0046 (0:01 real, 0.1469 ms/iter, ETA 0:07) Iteration 170000 M( 216091 )C, 0x04055d84a77be1d8, n = 12K, CUDALucas v2.04 Alpha err = 0.0046 (0:02 real, 0.1470 ms/iter, ETA 0:05) Iteration 180000 M( 216091 )C, 0xe3d74c104f02967d, n = 12K, CUDALucas v2.04 Alpha err = 0.0046 (0:01 real, 0.1449 ms/iter, ETA 0:04) Iteration 190000 M( 216091 )C, 0x54b2a8b9cb149f9f, n = 12K, CUDALucas v2.04 Alpha err = 0.0046 (0:02 real, 0.1469 ms/iter, ETA 0:02) Iteration 200000 M( 216091 )C, 0xf433496947b7b103, n = 12K, CUDALucas v2.04 Alpha err = 0.0046 (0:01 real, 0.1482 ms/iter, ETA 0:01) Iteration 210000 M( 216091 )C, 0xcfe091c8f59f8a7b, n = 12K, CUDALucas v2.04 Alpha err = 0.0046 (0:02 real, 0.1485 ms/iter, ETA 0:00) M( 216091 )P, n = 12K, CUDALucas v2.04 Alpha. Estimated total time: 0:32 Starting M86243 fft length = 4K Iteration 10000 M( 86243 )C, 0x26d11035920b3773, n = 4K, CUDALucas v2.04 Alpha err = 0.2617 (0:01 real, 0.1160 ms/iter, ETA 0:08) Iteration 20000 M( 86243 )C, 0x233a5255467a4c6e, n = 4K, CUDALucas v2.04 Alpha err = 0.2617 (0:01 real, 0.1137 ms/iter, ETA 0:06) Iteration 30000 M( 86243 )C, 0x88e3195a12367bb8, n = 4K, CUDALucas v2.04 Alpha err = 0.2617 (0:01 real, 0.1141 ms/iter, ETA 0:05) Iteration 40000 M( 86243 )C, 0x70b63ef639328851, n = 4K, CUDALucas v2.04 Alpha err = 0.2617 (0:01 real, 0.1153 ms/iter, ETA 0:04) Iteration 50000 M( 86243 )C, 0x0ff1f54cfeeb4909, n = 4K, CUDALucas v2.04 Alpha err = 0.2617 (0:01 real, 0.1136 ms/iter, ETA 0:03) Iteration 60000 M( 86243 )C, 0x25a4a96c66e7f897, n = 4K, CUDALucas v2.04 Alpha err = 0.2812 (0:02 real, 0.1183 ms/iter, ETA 0:02) Iteration 70000 M( 86243 )C, 0xb639453c818baba2, n = 4K, CUDALucas v2.04 Alpha err = 0.2812 (0:01 real, 0.1139 ms/iter, ETA 0:01) Iteration 80000 M( 86243 )C, 0xdd477c413184da18, n = 4K, CUDALucas v2.04 Alpha err = 0.2812 (0:01 real, 0.1089 ms/iter, ETA 0:00) M( 86243 )C, 0x2de7056ebffee28b, n = 4K, CUDALucas v2.04 Alpha. Estimated total time: 0:09 Starting M26458577 fft length = 1536K ^C SIGINT caught. Writing checkpoint. bill@Gravemind:~/CUDALucas/test∰∂ [/code] (It can't actually handle underscores, but I edited the output for obvious reasons. :razz:) PS Would any code gurus be willing to examine parse_worktodo_line() starting from line 317 and check for any stupids? |
Reproducible error in cufftbench
@msft: I've stumbled across this seemingly reproducible error in cufftbench().
[code]bill@Gravemind:~/CUDALucas∰∂ CUDALucas -threads 128 -cufftbench 5881856 5914624 64 CUFFT bench start = 5881856 end = 5914624 distance = 64 CUFFT_Z2Z size= 5881856 time= 986.398254 msec CUDALucas.cu(1066) : cufftSafeCall() CUFFT error 2: CUFFT_ALLOC_FAILED CUFFT_INVALID_TYPE CUFFT_INVALID_VALUE CUFFT_INTERNAL_ERROR CUFFT_EXEC_FAILED CUFFT_SETUP_FAILED CUFFT_INVALID_SIZE CUFFT_UNALIGNED_DATA CUFFT Unknown error code bill@Gravemind:~/CUDALucas∰∂ CUDALucas -cufftbench 5881856 5914624 64 CUFFT bench start = 5881856 end = 5914624 distance = 64 CUFFT_Z2Z size= 5881856 time= 986.098572 msec CUDALucas.cu(1066) : cufftSafeCall() CUFFT error 2: CUFFT_ALLOC_FAILED CUFFT_INVALID_TYPE CUFFT_INVALID_VALUE CUFFT_INTERNAL_ERROR CUFFT_EXEC_FAILED CUFFT_SETUP_FAILED CUFFT_INVALID_SIZE CUFFT_UNALIGNED_DATA CUFFT Unknown error code bill@Gravemind:~/CUDALucas∰∂ [/code] (This is with v2.03, although it also occurs in v2.04_test. In the latter case, it continued to test more lengths, but it did stop before it was supposed to.) Also, as I previously reported, cufftbench() still uses 1-2 full cores. Is that a bug or the nature of the function? |
More data on cufft crash
[code]bill@Gravemind:~/CUDALucas∰∂ CUDALucas -cufftbench $((256*128)) $((65535*128)) $((256*128))
CUFFT bench start = 32768 end = 8388480 distance = 32768 <good output snipped> CUFFT_Z2Z size= 6815744 time= 17.126163 msec CUFFT_Z2Z size= 6848512 time= 21.510880 msec CUFFT_Z2Z size= 6881280 time= 13.638905 msec CUFFT_Z2Z size= 6914048 time= 699.387634 msec CUFFT_Z2Z size= 6946816 time= 22.775032 msec CUFFT_Z2Z size= 6979584 time= 30.465769 msec CUFFT_Z2Z size= 7012352 time= 37.825619 msec CUFFT_Z2Z size= 7045120 time= 20.284300 msec CUFFT_Z2Z size= 7077888 time= 12.884492 msec CUFFT_Z2Z size= 7110656 time= 18.780321 msec CUFFT_Z2Z size= 7143424 time= 39.204491 msec CUFFT_Z2Z size= 7176192 time= 31.473606 msec CUFFT_Z2Z size= 7208960 time= 18.138344 msec CUFFT_Z2Z size= 7241728 time= 23.035593 msec CUFFT_Z2Z size= 7274496 time= 22.267868 msec CUDALucas.cu(1066) : cufftSafeCall() CUFFT error 2: CUFFT_ALLOC_FAILED CUFFT_INVALID_TYPE CUFFT_INVALID_VALUE CUFFT_INTERNAL_ERROR CUFFT_EXEC_FAILED CUFFT_SETUP_FAILED CUFFT_INVALID_SIZE CUFFT_UNALIGNED_DATA CUFFT Unknown error code [/code] It's a different size this time. |
Hi ,Dubslow
I believe you can read source code. |
[QUOTE=msft;302625]Hi ,Dubslow
I believe you can read source code.[/QUOTE] Yes I can, but I don't have the first clue about CUDA in general or cufft in particular. Just in case, I did look through it and I see the line that's causing the issue, but I have no idea what's wrong or how to fix it. [code]void cufftbench (int cufftbench_s, int cufftbench_e, int cufftbench_d) { cudaEvent_t start, stop; double *x; float outerTime; int i, j; printf ("CUFFT bench start = %d end = %d distance = %d\n", cufftbench_s, cufftbench_e, cufftbench_d); cutilSafeCall (cudaMalloc ((void **) &g_x, sizeof (double) * cufftbench_e)); x = ((double *) malloc (sizeof (double) * cufftbench_e + 1)); for (i = 0; i <= cufftbench_e; i++) x[i] = 0; cutilSafeCall (cudaMemcpy (g_x, x, sizeof (double) * cufftbench_e, cudaMemcpyHostToDevice)); cutilSafeCall (cudaEventCreate (&start)); cutilSafeCall (cudaEventCreate (&stop)); for (j = cufftbench_s; j <= cufftbench_e; j += cufftbench_d) { [B]cufftSafeCall (cufftPlan1d (&plan, j / 2, CUFFT_Z2Z, 1));[/B] cufftSafeCall (cufftExecZ2Z (plan, (cufftDoubleComplex *) g_x, (cufftDoubleComplex *) g_x, CUFFT_INVERSE)); cutilSafeCall (cudaEventRecord (start, 0)); for (i = 0; i < 100; i++) cufftSafeCall (cufftExecZ2Z (plan, (cufftDoubleComplex *) g_x, (cufftDoubleComplex *) g_x, CUFFT_INVERSE)); cutilSafeCall (cudaEventRecord (stop, 0)); cutilSafeCall (cudaEventSynchronize (stop)); cutilSafeCall (cudaEventElapsedTime (&outerTime, start, stop)); printf ("CUFFT_Z2Z size= %d time= %f msec\n", j, outerTime / 100); cufftSafeCall (cufftDestroy (plan)); } cutilSafeCall (cudaFree ((char *) g_x)); cutilSafeCall (cudaEventDestroy (start)); cutilSafeCall (cudaEventDestroy (stop)); free ((char *) x); }[/code] The bolded line is the one that's barfing. (I do recognize that it's the line that sets up the FFT, and it's the next line and inner loop that actually execute the FFT.) |
There is a bug in cuda_safecalls.h
[CODE]#ifdef _CUFFT_H_ inline void __cufftSafeCall( cufftResult err, const char *file, const int line ) { if( CUFFT_SUCCESS != err) { fprintf(stderr, "%s(%i) : cufftSafeCall() CUFFT error %d: ", file, line, (int)err); switch (err) { case CUFFT_INVALID_PLAN: fprintf(stderr, "CUFFT_INVALID_PLAN\n"); case CUFFT_ALLOC_FAILED: fprintf(stderr, "CUFFT_ALLOC_FAILED\n"); case CUFFT_INVALID_TYPE: fprintf(stderr, "CUFFT_INVALID_TYPE\n"); case CUFFT_INVALID_VALUE: fprintf(stderr, "CUFFT_INVALID_VALUE\n"); case CUFFT_INTERNAL_ERROR: fprintf(stderr, "CUFFT_INTERNAL_ERROR\n"); case CUFFT_EXEC_FAILED: fprintf(stderr, "CUFFT_EXEC_FAILED\n"); case CUFFT_SETUP_FAILED: fprintf(stderr, "CUFFT_SETUP_FAILED\n"); case CUFFT_INVALID_SIZE: fprintf(stderr, "CUFFT_INVALID_SIZE\n"); case CUFFT_UNALIGNED_DATA: fprintf(stderr, "CUFFT_UNALIGNED_DATA\n"); default: fprintf(stderr, "CUFFT Unknown error code\n"); } exit(-1); } } #endif [/CODE] "break" missing after each and every case statement. That is why the error dumped all the code, starting from CUFFT_ALLOC_FAILED |
Some more "todo" :razz: for you:
- log files. With old method we were forced to use batch files, and those had the advantage that usually the last line in the batch was a "pause" line. So, when the CL crashed, we still could see the error. With the new method with worktodo files, we do not need to use the batch anymore, but when there is a crash, we can't see the error (window is closed). It should be nice to have a log file, whose name and eventually verbosity level could be set from the ini file. - binaries for the beta, if you want it tested by third parties, it should be nice if you supply all the testing tools for it and not force the guy to crawl through the building process. OTOH, I am now doing first time LL with two cards in parallel (to be sure of the residue). I am at 21M iter from 46M and all matching up to now. |
Sigh... it had been on my personal todo list to clean up a lot of the functions, which would be necessary to produce logging functionality... it'd probably take me a couple of days.
If you've been following the SourceForge, you should be aware I've been having [URL="https://sourceforge.net/apps/trac/sourceforge/ticket/26276"]a lot of issues[/URL] with it. With one of the suggested fixes, I appear to be able to interact with [i]Subversion[/i] just fine, but SourceForge's [i]display[/i] of the repository does not display correctly. I have only to implement the results file locking functionality, which as you might have seen from mfakto's thread should be as simple as copy and paste. Once that was done and I was satisfied I wasn't going to make any more minor/cosmetic changes to the code, then I was going to post the code here for compiling as well as committing it to SVN. (It should be sometime in the next 12 hours.) (PS flash, do not use https to commit to SourceForge; there's a small chance that it could corrupt the repository.) |
2.04 (Beta)
1 Attachment(s)
[QUOTE=Dubslow;302866]It should be sometime in the next 12 hours...[/QUOTE]
Well, here it is. svn/trunk has been updated to r32, which of course isn't showing up properly in SourceForge per the link in the above post. As such, here's the 2.04 Beta code attached here for flash. It [i]should[/i] be bug-free, but of course that's the whole point of a Beta :smile: At a minimum, README and CUDALucas.ini should be shipped with any executables, the rest of the files are optional. (Of course, none of us really need the README, but hey.) ____________________________________________________________ Changes from 2.03 is pretty much new features: The "-i" option from 2.03 (print device info) has been moved to "-info", and in its place we have the same "-i <ini file name>" from mfakto. This means you can run two instances from the same directory, although they each need different work files. It is safe to have two (or more) different instances writing to the same work file, thanks to Bdot's file locking code. It is NOT safe to have two different instances testing the same exponent. LaurV, if you want that ability, how should the checkpoint files be named? (Perhaps "cxxxxxxxx-<ini name>" or "cxxxxxxxx.<device number>"? Such re-naming would only happen if the -i option was used.) The "total time" estimate has been added as LaurV requested. The estimated total time is printed whenever you pause a test, and when it finishes. However 2.04 can still read 2.03's save files, will recognize them as such, and won't print any messages in that case. All FFT lengths are now printed as a multiple of 1024, e.g. "n = 1440K" instead of "n = 1474560". The FFT length selection code has been spiffed up, though it isn't near the ability of Prime95's jump tables. To help mitigate that, the average error of the first 1000 iterations of every test is calculated, the same as with Prime95's "soft crossover" points. You can also now specify FFT length for an individual assignment via the work file. To do that, add a field to the "Test=..." assignment line in the work file. To use (e.g.) a 1440K length for a test, the line should look like [code]Test=<assignment key>,<exponent>,1440K[/code] Note that no space is allowed between the number (1440) and the K. You must have a K or M (e.g. "...,<exponent>,3M" for a 3M length) for the program to recognize the field as an FFT length. This feature should render the FFTLength ini option and the -f command line option obsolete, though of course they still work for backwards compatibility. The work file line parser now works apositionally, so there's no fuss about "Test=N/A,26204951" like in 2.03 with the comma counts and what not. It's smart enough to recognize an AID or exponent (or FFT length) :smile: The results file default is now "results.txt" (not "result.txt") due to a minor case of OCD on my part :razz: The AID (if any, not including "N/A") is now printed in the results file. (I plan to add V5UserID and ComputerID much like mfakto in 2.05, in preparation for when Christenson autmoates mfaktc.) The "err = " line now prints the maximum error since the last checkpoint, not the maximum error since the last (re)start. ---------------------------------------------------------------------------- What I'm looking for mostly is any cosmetic changes anybody wants, or things to be specifiable via command line/ini file. Examples: Should the estimated time be presented differently? Should I have an option to not print the info from the initial round off test? Should ResultsFile be specifiable via command line? Should the err printed be specifiable via ini or cmd line? ...or anything else you want changed/added. Please, don't hesitate. :smile: (PS I still have no idea what might cause cufftbench to crash. I didn't change that except for the 'break's that axn pointed out were missing.) |
Demonstration
I tried to edit it into the previous post, but it was too long.
[code]bill@Gravemind:~/CUDALucas/test/test∰∂ cat worktodo.txt DoubleCheck=N/A,216091,24K,1,69 Test=12K,N/A,69,216091 Test=86243,4K,CA9CAECD26710FC828DFBBB8 DoubleCheck=CA9CAECD26710FC828DFBBB8________,26458577,69,1 bill@Gravemind:~/CUDALucas/test/test∰∂ ./CUDALucas Starting M216091 fft length = 24K Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length. Iteration 100, average error = 0.00000, max error = 0.00000 Iteration 200, average error = 0.00000, max error = 0.00000 Iteration 300, average error = 0.00000, max error = 0.00000 Iteration 400, average error = 0.00000, max error = 0.00000 Iteration 500, average error = 0.00000, max error = 0.00000 Iteration 600, average error = 0.00000, max error = 0.00000 Iteration 700, average error = 0.00000, max error = 0.00000 Iteration 800, average error = 0.00000, max error = 0.00000 Iteration 900, average error = 0.00000, max error = 0.00000 Iteration 1000, average error = 0.00000 < 0.25 (max error = 0.00000), continuing test. Iteration 20000 M( 216091 )C, 0x13e968bf40fda4d7, n = 24K, CUDALucas v2.04 Beta err = 0.0000 (0:03 real, 0.1236 ms/iter, ETA 0:22) Iteration 40000 M( 216091 )C, 0xc26da9695ac418c1, n = 24K, CUDALucas v2.04 Beta err = 0.0000 (0:02 real, 0.1172 ms/iter, ETA 0:18) Iteration 60000 M( 216091 )C, 0x99aa87c495daffe7, n = 24K, CUDALucas v2.04 Beta err = 0.0000 (0:02 real, 0.1173 ms/iter, ETA 0:16) Iteration 80000 M( 216091 )C, 0xddf612c72037b8a1, n = 24K, CUDALucas v2.04 Beta err = 0.0000 (0:03 real, 0.1175 ms/iter, ETA 0:14) Iteration 100000 M( 216091 )C, 0x4de7f101ee1cb7a5, n = 24K, CUDALucas v2.04 Beta err = 0.0000 (0:02 real, 0.1172 ms/iter, ETA 0:11) Iteration 120000 M( 216091 )C, 0x3981b56788b529e2, n = 24K, CUDALucas v2.04 Beta err = 0.0000 (0:02 real, 0.1172 ms/iter, ETA 0:09) Iteration 140000 M( 216091 )C, 0x669382faea06df89, n = 24K, CUDALucas v2.04 Beta err = 0.0000 (0:03 real, 0.1178 ms/iter, ETA 0:07) Iteration 160000 M( 216091 )C, 0xb391010f29c70ee1, n = 24K, CUDALucas v2.04 Beta err = 0.0000 (0:02 real, 0.1177 ms/iter, ETA 0:04) Iteration 180000 M( 216091 )C, 0xe3d74c104f02967d, n = 24K, CUDALucas v2.04 Beta err = 0.0000 (0:02 real, 0.1172 ms/iter, ETA 0:02) Iteration 200000 M( 216091 )C, 0xf433496947b7b103, n = 24K, CUDALucas v2.04 Beta err = 0.0000 (0:03 real, 0.1179 ms/iter, ETA 0:00) M( 216091 )P, n = 24K, CUDALucas v2.04 Beta, estimated total time = 0:26 Starting M216091 fft length = 12K Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length. Iteration 100, average error = 0.00310, max error = 0.00415 Iteration 200, average error = 0.00341, max error = 0.00427 Iteration 300, average error = 0.00349, max error = 0.00415 Iteration 400, average error = 0.00351, max error = 0.00415 Iteration 500, average error = 0.00355, max error = 0.00430 Iteration 600, average error = 0.00355, max error = 0.00415 Iteration 700, average error = 0.00356, max error = 0.00421 Iteration 800, average error = 0.00358, max error = 0.00439 Iteration 900, average error = 0.00361, max error = 0.00452 Iteration 1000, average error = 0.00361 < 0.25 (max error = 0.00427), continuing test. Iteration 20000 M( 216091 )C, 0x13e968bf40fda4d7, n = 12K, CUDALucas v2.04 Beta err = 0.0042 (0:02 real, 0.0967 ms/iter, ETA 0:17) Iteration 40000 M( 216091 )C, 0xc26da9695ac418c1, n = 12K, CUDALucas v2.04 Beta err = 0.0042 (0:02 real, 0.0927 ms/iter, ETA 0:14) Iteration 60000 M( 216091 )C, 0x99aa87c495daffe7, n = 12K, CUDALucas v2.04 Beta err = 0.0042 (0:01 real, 0.0930 ms/iter, ETA 0:13) Iteration 80000 M( 216091 )C, 0xddf612c72037b8a1, n = 12K, CUDALucas v2.04 Beta err = 0.0044 (0:02 real, 0.0928 ms/iter, ETA 0:11) Iteration 100000 M( 216091 )C, 0x4de7f101ee1cb7a5, n = 12K, CUDALucas v2.04 Beta err = 0.0044 (0:02 real, 0.0929 ms/iter, ETA 0:09) Iteration 120000 M( 216091 )C, 0x3981b56788b529e2, n = 12K, CUDALucas v2.04 Beta err = 0.0045 (0:02 real, 0.0930 ms/iter, ETA 0:07) Iteration 140000 M( 216091 )C, 0x669382faea06df89, n = 12K, CUDALucas v2.04 Beta err = 0.0046 (0:02 real, 0.0931 ms/iter, ETA 0:05) Iteration 160000 M( 216091 )C, 0xb391010f29c70ee1, n = 12K, CUDALucas v2.04 Beta err = 0.0042 (0:02 real, 0.0928 ms/iter, ETA 0:03) Iteration 180000 M( 216091 )C, 0xe3d74c104f02967d, n = 12K, CUDALucas v2.04 Beta err = 0.0044 (0:02 real, 0.0928 ms/iter, ETA 0:01) Iteration 200000 M( 216091 )C, 0xf433496947b7b103, n = 12K, CUDALucas v2.04 Beta err = 0.0041 (0:01 real, 0.0928 ms/iter, ETA 0:00) M( 216091 )P, n = 12K, CUDALucas v2.04 Beta, estimated total time = 0:20 Starting M86243 fft length = 4K Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length. Iteration 100, average error = 0.16577, max error = 0.20703 Iteration 200, average error = 0.18153, max error = 0.25000 Iteration 300, average error = 0.18649, max error = 0.25586 Iteration 400, average error = 0.18799, max error = 0.26172 Iteration 500, average error = 0.19219, max error = 0.25000 Iteration 600, average error = 0.19283, max error = 0.25000 Iteration 700, average error = 0.19504, max error = 0.25000 Iteration 800, average error = 0.19555, max error = 0.22656 Iteration 900, average error = 0.19565, max error = 0.21973 Iteration 1000, average error = 0.19539 < 0.25 (max error = 0.23438), continuing test. Iteration 20000 M( 86243 )C, 0x233a5255467a4c6e, n = 4K, CUDALucas v2.04 Beta err = 0.2188 (0:01 real, 0.0651 ms/iter, ETA 0:03) Iteration 40000 M( 86243 )C, 0x70b63ef639328851, n = 4K, CUDALucas v2.04 Beta err = 0.2344 (0:01 real, 0.0616 ms/iter, ETA 0:02) Iteration 60000 M( 86243 )C, 0x25a4a96c66e7f897, n = 4K, CUDALucas v2.04 Beta err = 0.2812 (0:02 real, 0.0617 ms/iter, ETA 0:01) Iteration 80000 M( 86243 )C, 0xdd477c413184da18, n = 4K, CUDALucas v2.04 Beta err = 0.2219 (0:01 real, 0.0618 ms/iter, ETA 0:00) M( 86243 )C, 0x2de7056ebffee28b, n = 4K, CUDALucas v2.04 Beta, estimated total time = 0:05 Starting M26458577 fft length = 1440K Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length. Iteration 100, average error = 0.08321, max error = 0.11523 Iteration 200, average error = 0.09496, max error = 0.11719 Iteration 300, average error = 0.09992, max error = 0.12500 Iteration 400, average error = 0.10188, max error = 0.12500 Iteration 500, average error = 0.10267, max error = 0.11792 Iteration 600, average error = 0.10390, max error = 0.12109 Iteration 700, average error = 0.10444, max error = 0.11914 Iteration 800, average error = 0.10462, max error = 0.11230 Iteration 900, average error = 0.10468, max error = 0.11328 Iteration 1000, average error = 0.10509 < 0.25 (max error = 0.12305), continuing test. Iteration 20000 M( 26458577 )C, 0xc0983d54298faf9e, n = 1440K, CUDALucas v2.04 Beta err = 0.1211 (1:50 real, 5.4642 ms/iter, ETA 40:06:03) Iteration 40000 M( 26458577 )C, 0xc1989436f611c782, n = 1440K, CUDALucas v2.04 Beta err = 0.1172 (1:48 real, 5.4034 ms/iter, ETA 39:37:28) ^C SIGINT caught, writing checkpoint. Estimated time spent so far: 3:53 bill@Gravemind:~/CUDALucas/test/test∰∂ ./CUDALucas Continuing work from a partial result of M26458577 fft length = 1440K iteration = 42802 Iteration 60000 M( 26458577 )C, 0xdcc248d33c956284, n = 1440K, CUDALucas v2.04 Beta err = 0.1230 (1:33 real, 4.6479 ms/iter, ETA 34:03:31) Iteration 80000 M( 26458577 )C, 0x281ebeaddc336d4b, n = 1440K, CUDALucas v2.04 Beta err = 0.1211 (1:48 real, 5.4052 ms/iter, ETA 39:34:42) ^C SIGINT caught, writing checkpoint. Estimated time spent so far: 7:16 bill@Gravemind:~/CUDALucas/test/test∰∂ cat results.txt M( 216091 )P, n = 24K, CUDALucas v2.04 Beta M( 216091 )P, n = 12K, CUDALucas v2.04 Beta M( 86243 )C, 0x2de7056ebffee28b, n = 4K, CUDALucas v2.04 Beta, AID: CA9CAECD26710FC828DFBBB8 bill@Gravemind:~/CUDALucas/test/test∰∂[/code] |
Very good, man! Msft would be proud :D
Now seriously, you are putting a lot of effort into it. Do you never sleep? |
I am trying to compile... there are still some lingering issues because of the change from /Tp (C++) during compile. See [URL="http://andre.stechert.org/urwhatu/2006/01/error_c2143_syn.html"]here[/URL] for an example.
I will work on it some more tomorrow. |
[QUOTE=flashjh;302949]See [URL="http://andre.stechert.org/urwhatu/2006/01/error_c2143_syn.html"]here[/URL] for an example.[/QUOTE]
...compiler error. That's [i]way[/i] stupid. Blegh... Edit: Perhaps if you copy/paste the errors here? |
[QUOTE=Dubslow;302951]...compiler error. That's [i]way[/i] stupid. Blegh...
Edit: Perhaps if you copy/paste the errors here?[/QUOTE] It's working like a C89 compiler should. Declaring variables anywhere but the start of a scope isn't C, even if some compilers allow it as an extension to the language. Annoying, but I think GCC will do the same thing if you set it up to be a true C compiler (i.e. -ansi -pedantic). C is on life support in Visual Studio. Any reason not to build the code as C++? |
[QUOTE=kjaget;303001]Any reason not to build the code as C++?[/QUOTE]
Because it's not C++? :razz: I realize that it [i]shouldn't[/i] cause problems, but there are a few obscure cases where C++ compiler would compile C code differently than a C compiler, and I don't really want to look for those. |
[QUOTE=Dubslow;303023]Because it's not C++? :razz:
I realize that it [i]shouldn't[/i] cause problems, but there are a few obscure cases where C++ compiler would compile C code differently than a C compiler, and I don't really want to look for those.[/QUOTE] The C++ idiosyncrasies have been fixed. It compiles fine now as C only. @Dubslow: As soon as you fix my 'mistake' on SourceForge, I'll get the code uploaded and see if I can get the 2.04 Beta binaries there as well. If not,I'll post here until I can get that fixed. |
Done, and also I just made you a full blown admin.
|
[QUOTE=Dubslow;303109]Done, and also I just made you a full blown admin.[/QUOTE]
CUDALucas 2.04 Beta x64 binaries are posted to [URL="https://sourceforge.net/projects/cudalucas/files/2.04%20Beta/"]here[/URL] Use the included CUDALucas.ini file to resolve the results.txt error (if you haven't already) |
Stupid question: what are the last 4 bytes of the checkpoint files? From the source I can see there is a double called "x", but it is 3 AM here and I want to go to bed to book 4 hours of sleeping before going to job, and I don't have time to go deeper in that source. I am running a triple check, up to now I got all residues the same like the DC, still 6-7 millions to go, so the same residues, the same file names (the residue in in the file name too) but when I do a binary comparison, the last 4 bytes are always different by a fixed amount (or a fixed mask, I could not figure it out exactly yet). Is that normal, or my card is walking in the weeds already?
|
Depends what version.
[code]void write_checkpoint (double *x, int q, int n, int j, long total_time) { <snip> fwrite (&q, 1, sizeof (q), fPtr); fwrite (&n, 1, sizeof (n), fPtr); fwrite (&j, 1, sizeof (j), fPtr); fwrite (x, 1, sizeof (double) * n, fPtr); fwrite (&total_time, 1, sizeof(total_time), fPtr); // exclude this line < 2.04[/code] So the format is: Exponent = sizeof(int); FFT length = sizeof(int); Iteration = sizeof(int); Intermediate data = sizeof(double) * 2 * FFT length (v >= 2.04) Total time = sizeof(long) Presumably sizeof(long)==[STRIKE]4[/STRIKE]8 on your system. Total time is only tracked with precision of 1 second, but there's probably been at least some minor difference in the total time between to two runs. Are you able to compare them bit for bit? That fixed difference you mention shouldn't amount to more than a few seconds. Here's a couple of programs you could run (they should easily compile with whatever free version of MSVS you have): [code] #include <stdio.h> #include <stdlib.h> int main(void) { printf("The size of a short is %lu\n", sizeof(short)); printf("The size of an int is %lu\n", sizeof(int)); printf("The size of a long is %lu\n", sizeof(long)); printf("The size of a long long is %lu\n", sizeof(long long)); printf("The size of a float is %lu\n", sizeof(float)); printf("The size of a double is %lu\n", sizeof(double)); printf("The size of a long double is %lu\n", sizeof(long double)); return 7; }[/code] On my system: [code]bill@Gravemind:~/bin/c∰∂ ./size The size of a short is 2 The size of an int is 4 The size of a long is 8 The size of a long long is 8 The size of a float is 4 The size of a double is 8 The size of a long double is 16[/code] Program 2: [code]#include <stdlib.h> #include <stdio.h> void print_time_from_seconds (int sec) // copied almost verbatim from CuLu source { if (sec > 3600) { printf ("%d", sec / 3600); sec %= 3600; printf (":%02d", sec / 60); } else printf ("%d", sec / 60); sec %= 60; printf (":%02d\n", sec); } int main(int argc, char** argv) { char* name; int q, n, j; long t; double* x; FILE* f; if( !argv[1] ) { printf("First argument should be name of checkpoint file\n"); return -1; } name = argv[1]; f = fopen(name, "rb"); // Ignore compiler warnings about "secure functions" fread(&q, sizeof(int), 1, f); fread(&n, sizeof(int), 1, f); fread(&j, sizeof(int), 1, f); x = (double*) malloc(sizeof(double)*n); fread(x, sizeof(double), n, f); fread(&t, sizeof(long), 1, f); printf("This is a checkpoint for exp = %d, n = %dK, iter = %d, and total time = %ld = ", q, n/1024, j, t); print_time_from_seconds(t); return 127; }[/code] [code]bill@Gravemind:~/bin/c∰∂ ckp c26448743 This is a checkpoint for exp = 26448743, n = 1440K, iter = 13820001, and total time = 75137 = 20:52:17[/code] |
[QUOTE=flashjh;303125]CUDALucas 2.04 Beta x64 binaries are posted to [URL="https://sourceforge.net/projects/cudalucas/files/2.04%20Beta/"]here[/URL]
Use the included CUDALucas.ini file to resolve the results.txt error (if you haven't already)[/QUOTE] "CUDALucas2.04 Beta-3.2-sm_13-x64.exe" crashes at the end. It leaves results.txt.lck while it's running. If I restart it crashes again. When I manually delete results.txt.lck before restarting it does not crash. Thanks, Andriy |
[QUOTE=apsen;303731]"CUDALucas2.04 Beta-3.2-sm_13-x64.exe" crashes at the end. It leaves results.txt.lck while it's running. If I restart it crashes again. When I manually delete results.txt.lck before restarting it does not crash.
Thanks, Andriy[/QUOTE] Strange, I haven't had any problems with it so far (but I've only completed one DC with it). I will upload a version in a few minutes. Will you test it and let us know if it works? Thanks. Edit: Ok, I made a minor commit and [URL="https://sourceforge.net/projects/cudalucas/files/2.04%20Beta/"]uploaded the files[/URL]. Let us know if they work (or not). |
[QUOTE=flashjh;303733]Strange, I haven't had any problems with it so far (but I've only completed one DC with it). I will upload a version in a few minutes. Will you test it and let us know if it works? Thanks.
Edit: Ok, I made a minor commit and [URL="https://sourceforge.net/projects/cudalucas/files/2.04%20Beta/"]uploaded the files[/URL]. Let us know if they work (or not).[/QUOTE] Hmmm... why do you think that adding the const will help? Is it something I don't know about Microsoft's library functions? (Strictly speaking, frmt should be const as well...) :unsure: Andriy, Is there any sort of error message? Do you know if it crashes in the lock or unlock function? That is, does it crash before or after actually printing the results to the results file? (As for restarting after the crash while the lock file still exists, that will send it into an infinite sleep loop waiting for the file to be unlocked. Does anyone have any better ideas?) Thanks, Bill PS @Everyone: I just realized I made a fairly serious typo in my initial 2.04 post. I said something like "NOT safe to share work files, but is safe to share work files" where of course I meant "NOT safe to share work files, but is safe to share results file". I'm pretty sure everyone understood what I meant; nevertheless, could a mod please fix it? |
[QUOTE=Dubslow;303740]Hmmm... why do you think that adding the const will help? Is it something I don't know about Microsoft's library functions? (Strictly speaking, frmt should be const as well...)
:unsure: Andriy, Is there any sort of error message? Do you know if it crashes in the lock or unlock function? That is, does it crash before or after actually printing the results to the results file? (As for restarting after the crash while the lock file still exists, that will send it into an infinite sleep loop waiting for the file to be unlocked. Does anyone have any better ideas?) Thanks, Bill[/QUOTE] I didn't think it would, but I've been using that version with no errors, so I wanted to get it tested and I didn't want to upload without having the source code there. |
[QUOTE=Dubslow;303740]
Andriy, Is there any sort of error message? Do you know if it crashes in the lock or unlock function? That is, does it crash before or after actually printing the results to the results file? (As for restarting after the crash while the lock file still exists, that will send it into an infinite sleep loop waiting for the file to be unlocked. Does anyone have any better ideas?) Thanks, Bill [/QUOTE] No it's complete crush. I could try to debug it if you'll get me debug info. No output to results file or screen and if you start it again it will restart test from the last multiple of checkpoint before the end of test. |
[QUOTE=Dubslow;303740](As for restarting after the crash while the lock file still exists, that will send it into an infinite sleep loop waiting for the file to be unlocked.
[/QUOTE] No, it crashes. |
[QUOTE=apsen;303759]No, it crashes.[/QUOTE]
Sometimes Windows will detect programs going into infinite loops, and report that the program is not responding. That's not what happened? Here's the relevant code: [code] //[URL="http://sourceforge.net/p/cudalucas/code/35/tree/trunk/parse.c"]parse.c[/URL], line 86 #include <winsock2.h> #include <io.h> #include <share.h> //used for _sopen_s #undef close #define close _close #define sched_yield SwitchToThread #define MODE _S_IREAD | _S_IWRITE #define strncasecmp _strnicmp /* Everything from here to the next include is to make MSVS happy. */ #define sscanf sscanf_s /* This only works for scanning numbers, or strings with a defined length (e.g. "%131s") */ void strcopy(char* dest, char* src, size_t n) { strncpy_s(dest, MAX_LINE_LENGTH+1, src, n); } FILE* _fopen(const char* path, const char* mode) { FILE* stream; errno_t err = fopen_s(&stream, path, mode); if(err) return NULL; else return stream; } void _sprintf(char* buf, char* frmt, const char* string) { // only used in filelocking code sprintf_s(buf, 251, frmt, string); } int open_s(const char *filename, int oflag, int pmode) { int file_handle; errno_t err = _sopen_s( &file_handle, filename, oflag, _SH_DENYNO, pmode); if (err) { close (file_handle); return -1; } else return 0; } void _strcpy(char *dest, const char *src) { strcpy_s (dest, _countof(dest), src); } [/code] [code]//[URL="http://sourceforge.net/p/cudalucas/code/35/tree/trunk/parse.c"]parse.c[/URL] line 728 /*****************************************************************************/ /* mfakto's file locking code */ #define MAX_LOCKED_FILES 3 typedef struct _lockinfo { int lockfd; FILE * open_file; char lock_filename[256]; } lockinfo; static unsigned int num_locked_files = 0; static lockinfo locked_files[MAX_LOCKED_FILES]; FILE *fopen_and_lock(const char *path, const char *mode) { unsigned int i; int lockfd; FILE *f; #ifdef EBUG printf("\nlock() called on %s\n", path); #endif if (strlen(path) > 250) { fprintf(stderr, "Cannot open %.250s: Name too long.\n", path); return NULL; } if (num_locked_files >= MAX_LOCKED_FILES) { fprintf(stderr, "Cannot open %.250s: Too many locked files.\n", path); return NULL; } _sprintf( locked_files[num_locked_files].lock_filename, "%.250s.lck", path); for(i=0;;) { if ((lockfd = open_s(locked_files[num_locked_files].lock_filename, O_EXCL | O_CREAT, MODE)) < 0) { if (errno == EEXIST) { if (i==0) fprintf(stderr, "%.250s is locked, waiting ...\n", path); if (i<1000) i++; // slowly increase sleep time up to 1 sec Sleep(i); continue; } else { perror("Cannot open lockfile"); break; } } break; } locked_files[num_locked_files].lockfd = lockfd; if (lockfd > 0 && i > 0) { printf("Locked %.250s\n", path); } f = _fopen(path, mode); if (f) { locked_files[num_locked_files++].open_file = f; } else { if (close(locked_files[num_locked_files].lockfd) != 0) perror("Failed to close lockfile"); if (remove(locked_files[num_locked_files].lock_filename)!= 0) perror("Failed to delete lockfile"); } #ifdef EBUG printf("successfully locked %s\n", path); #endif #ifdef TEST while(1); #endif return f; } int unlock_and_fclose(FILE *f) { unsigned int i, j; int ret = 0; #ifdef EBUG printf("unlock() called\n"); #endif if (f == NULL) return -1; for (i=0; i<num_locked_files; i++) { if (locked_files[i].open_file == f) { ret = fclose(f); f = NULL; if (close(locked_files[i].lockfd) != 0) perror("Failed to close lockfile"); if (remove(locked_files[i].lock_filename)!= 0) perror("Failed to delete lockfile"); for (j=i+1; j<num_locked_files; j++) { locked_files[j-1].lockfd = locked_files[j].lockfd; locked_files[j-1].open_file = locked_files[j].open_file; _strcpy(locked_files[j-1].lock_filename, locked_files[j].lock_filename); } num_locked_files--; break; } } if (f) { fprintf(stderr, "File was not locked!\n"); ret = fclose(f); } #ifdef EBUG printf("successfully unlocked\n"); #endif return ret; } [/code] [code]//[URL="http://sourceforge.net/p/cudalucas/code/35/tree/trunk/CUDALucas.cu?force=True"]CUDALucas.cu[/URL], near the bottom of check() (line ~1400?) gettimeofday (&time1, NULL); FILE* fp = fopen_and_lock(RESULTSFILE, "a"); if(!fp) { fprintf (stderr, "Cannot write results to %s\n\n", RESULTSFILE); exit (1); } printbits (x, q, n, b, c, high, low, 64, fp, 0); if( total_time >= 0 ) { /* Only print time if we don't have an old checkpoint file */ total_time += (time1.tv_sec - start_time); printf (", estimated total time = "); print_time_from_seconds(total_time); } if( AID[0] && strncasecmp(AID, "N/A", 3) ) { // If (AID is not null), AND (AID is NOT "N/A") (case insensitive) fprintf(fp, ", AID: %s\n", AID); } else { fprintf(fp, "\n"); } unlock_and_fclose(fp); fflush (stdout); rm_checkpoint (q); [/code] I can't create Windows executables, and I don't know much about MSVS; you'll have to tell flash how to compile it with debugging symbols in Windows. Edit: Flash! The fix to open_s() never made it into r35! (It should return file_handle, not 0!) |
[QUOTE=Dubslow;303773]Sometimes Windows will detect programs going into infinite loops, and report that the program is not responding. That's not what happened?
Here's the relevant code: ... I can't create Windows executables, and I don't know much about MSVS; you'll have to tell flash how to compile it with debugging symbols in Windows. Edit: Flash! The fix to open_s() never made it into r35! (It should return file_handle, not 0!)[/QUOTE] Yes, I see. I'll fix and repost everything in the morning. EDIT: When quoting the immediately previous message, including the whole thing is unnecessary and only makes thread hard to read. SB |
Version 2.04 beta, standard (this means I did not get yet any fix you are talking here about):
For me it is not really a crash, but it fails to delete the lck file. I used to delete them by hand, as a "routine clearing" process (I have multiple cudalucas folders, for each instance I run, I do all reports by hand and clear the folders after that, by deleting all checkpoints and files except the exe, ini, and worktodo in each folder, this is done by a batch file outside of the folder, in the parent). I will try to let it there to see what it will happen. |
[QUOTE=LaurV;303779]Version 2.04 beta, standard (this means I did not get yet any fix you are talking here about):
For me it is not really a crash, but it fails to delete the lck file. I used to delete them by hand, as a "routine clearing" process (I have multiple cudalucas folders, for each instance I run, I do all reports by hand and clear the folders after that, by deleting all checkpoints and files except the exe, ini, and worktodo in each folder, this is done by a batch file outside of the folder, in the parent). I will try to let it there to see what it will happen.[/QUOTE] You don't get any error messages either, like "Failed to close lockfile" in the code? And it doesn't crash, but just fails to delete the lock file? |
| All times are UTC. The time now is 22:00. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.