![]() |
Done, and also I just made you a full blown admin.
|
[QUOTE=Dubslow;303109]Done, and also I just made you a full blown admin.[/QUOTE]
CUDALucas 2.04 Beta x64 binaries are posted to [URL="https://sourceforge.net/projects/cudalucas/files/2.04%20Beta/"]here[/URL] Use the included CUDALucas.ini file to resolve the results.txt error (if you haven't already) |
Stupid question: what are the last 4 bytes of the checkpoint files? From the source I can see there is a double called "x", but it is 3 AM here and I want to go to bed to book 4 hours of sleeping before going to job, and I don't have time to go deeper in that source. I am running a triple check, up to now I got all residues the same like the DC, still 6-7 millions to go, so the same residues, the same file names (the residue in in the file name too) but when I do a binary comparison, the last 4 bytes are always different by a fixed amount (or a fixed mask, I could not figure it out exactly yet). Is that normal, or my card is walking in the weeds already?
|
Depends what version.
[code]void write_checkpoint (double *x, int q, int n, int j, long total_time) { <snip> fwrite (&q, 1, sizeof (q), fPtr); fwrite (&n, 1, sizeof (n), fPtr); fwrite (&j, 1, sizeof (j), fPtr); fwrite (x, 1, sizeof (double) * n, fPtr); fwrite (&total_time, 1, sizeof(total_time), fPtr); // exclude this line < 2.04[/code] So the format is: Exponent = sizeof(int); FFT length = sizeof(int); Iteration = sizeof(int); Intermediate data = sizeof(double) * 2 * FFT length (v >= 2.04) Total time = sizeof(long) Presumably sizeof(long)==[STRIKE]4[/STRIKE]8 on your system. Total time is only tracked with precision of 1 second, but there's probably been at least some minor difference in the total time between to two runs. Are you able to compare them bit for bit? That fixed difference you mention shouldn't amount to more than a few seconds. Here's a couple of programs you could run (they should easily compile with whatever free version of MSVS you have): [code] #include <stdio.h> #include <stdlib.h> int main(void) { printf("The size of a short is %lu\n", sizeof(short)); printf("The size of an int is %lu\n", sizeof(int)); printf("The size of a long is %lu\n", sizeof(long)); printf("The size of a long long is %lu\n", sizeof(long long)); printf("The size of a float is %lu\n", sizeof(float)); printf("The size of a double is %lu\n", sizeof(double)); printf("The size of a long double is %lu\n", sizeof(long double)); return 7; }[/code] On my system: [code]bill@Gravemind:~/bin/c∰∂ ./size The size of a short is 2 The size of an int is 4 The size of a long is 8 The size of a long long is 8 The size of a float is 4 The size of a double is 8 The size of a long double is 16[/code] Program 2: [code]#include <stdlib.h> #include <stdio.h> void print_time_from_seconds (int sec) // copied almost verbatim from CuLu source { if (sec > 3600) { printf ("%d", sec / 3600); sec %= 3600; printf (":%02d", sec / 60); } else printf ("%d", sec / 60); sec %= 60; printf (":%02d\n", sec); } int main(int argc, char** argv) { char* name; int q, n, j; long t; double* x; FILE* f; if( !argv[1] ) { printf("First argument should be name of checkpoint file\n"); return -1; } name = argv[1]; f = fopen(name, "rb"); // Ignore compiler warnings about "secure functions" fread(&q, sizeof(int), 1, f); fread(&n, sizeof(int), 1, f); fread(&j, sizeof(int), 1, f); x = (double*) malloc(sizeof(double)*n); fread(x, sizeof(double), n, f); fread(&t, sizeof(long), 1, f); printf("This is a checkpoint for exp = %d, n = %dK, iter = %d, and total time = %ld = ", q, n/1024, j, t); print_time_from_seconds(t); return 127; }[/code] [code]bill@Gravemind:~/bin/c∰∂ ckp c26448743 This is a checkpoint for exp = 26448743, n = 1440K, iter = 13820001, and total time = 75137 = 20:52:17[/code] |
[QUOTE=flashjh;303125]CUDALucas 2.04 Beta x64 binaries are posted to [URL="https://sourceforge.net/projects/cudalucas/files/2.04%20Beta/"]here[/URL]
Use the included CUDALucas.ini file to resolve the results.txt error (if you haven't already)[/QUOTE] "CUDALucas2.04 Beta-3.2-sm_13-x64.exe" crashes at the end. It leaves results.txt.lck while it's running. If I restart it crashes again. When I manually delete results.txt.lck before restarting it does not crash. Thanks, Andriy |
[QUOTE=apsen;303731]"CUDALucas2.04 Beta-3.2-sm_13-x64.exe" crashes at the end. It leaves results.txt.lck while it's running. If I restart it crashes again. When I manually delete results.txt.lck before restarting it does not crash.
Thanks, Andriy[/QUOTE] Strange, I haven't had any problems with it so far (but I've only completed one DC with it). I will upload a version in a few minutes. Will you test it and let us know if it works? Thanks. Edit: Ok, I made a minor commit and [URL="https://sourceforge.net/projects/cudalucas/files/2.04%20Beta/"]uploaded the files[/URL]. Let us know if they work (or not). |
[QUOTE=flashjh;303733]Strange, I haven't had any problems with it so far (but I've only completed one DC with it). I will upload a version in a few minutes. Will you test it and let us know if it works? Thanks.
Edit: Ok, I made a minor commit and [URL="https://sourceforge.net/projects/cudalucas/files/2.04%20Beta/"]uploaded the files[/URL]. Let us know if they work (or not).[/QUOTE] Hmmm... why do you think that adding the const will help? Is it something I don't know about Microsoft's library functions? (Strictly speaking, frmt should be const as well...) :unsure: Andriy, Is there any sort of error message? Do you know if it crashes in the lock or unlock function? That is, does it crash before or after actually printing the results to the results file? (As for restarting after the crash while the lock file still exists, that will send it into an infinite sleep loop waiting for the file to be unlocked. Does anyone have any better ideas?) Thanks, Bill PS @Everyone: I just realized I made a fairly serious typo in my initial 2.04 post. I said something like "NOT safe to share work files, but is safe to share work files" where of course I meant "NOT safe to share work files, but is safe to share results file". I'm pretty sure everyone understood what I meant; nevertheless, could a mod please fix it? |
[QUOTE=Dubslow;303740]Hmmm... why do you think that adding the const will help? Is it something I don't know about Microsoft's library functions? (Strictly speaking, frmt should be const as well...)
:unsure: Andriy, Is there any sort of error message? Do you know if it crashes in the lock or unlock function? That is, does it crash before or after actually printing the results to the results file? (As for restarting after the crash while the lock file still exists, that will send it into an infinite sleep loop waiting for the file to be unlocked. Does anyone have any better ideas?) Thanks, Bill[/QUOTE] I didn't think it would, but I've been using that version with no errors, so I wanted to get it tested and I didn't want to upload without having the source code there. |
[QUOTE=Dubslow;303740]
Andriy, Is there any sort of error message? Do you know if it crashes in the lock or unlock function? That is, does it crash before or after actually printing the results to the results file? (As for restarting after the crash while the lock file still exists, that will send it into an infinite sleep loop waiting for the file to be unlocked. Does anyone have any better ideas?) Thanks, Bill [/QUOTE] No it's complete crush. I could try to debug it if you'll get me debug info. No output to results file or screen and if you start it again it will restart test from the last multiple of checkpoint before the end of test. |
[QUOTE=Dubslow;303740](As for restarting after the crash while the lock file still exists, that will send it into an infinite sleep loop waiting for the file to be unlocked.
[/QUOTE] No, it crashes. |
[QUOTE=apsen;303759]No, it crashes.[/QUOTE]
Sometimes Windows will detect programs going into infinite loops, and report that the program is not responding. That's not what happened? Here's the relevant code: [code] //[URL="http://sourceforge.net/p/cudalucas/code/35/tree/trunk/parse.c"]parse.c[/URL], line 86 #include <winsock2.h> #include <io.h> #include <share.h> //used for _sopen_s #undef close #define close _close #define sched_yield SwitchToThread #define MODE _S_IREAD | _S_IWRITE #define strncasecmp _strnicmp /* Everything from here to the next include is to make MSVS happy. */ #define sscanf sscanf_s /* This only works for scanning numbers, or strings with a defined length (e.g. "%131s") */ void strcopy(char* dest, char* src, size_t n) { strncpy_s(dest, MAX_LINE_LENGTH+1, src, n); } FILE* _fopen(const char* path, const char* mode) { FILE* stream; errno_t err = fopen_s(&stream, path, mode); if(err) return NULL; else return stream; } void _sprintf(char* buf, char* frmt, const char* string) { // only used in filelocking code sprintf_s(buf, 251, frmt, string); } int open_s(const char *filename, int oflag, int pmode) { int file_handle; errno_t err = _sopen_s( &file_handle, filename, oflag, _SH_DENYNO, pmode); if (err) { close (file_handle); return -1; } else return 0; } void _strcpy(char *dest, const char *src) { strcpy_s (dest, _countof(dest), src); } [/code] [code]//[URL="http://sourceforge.net/p/cudalucas/code/35/tree/trunk/parse.c"]parse.c[/URL] line 728 /*****************************************************************************/ /* mfakto's file locking code */ #define MAX_LOCKED_FILES 3 typedef struct _lockinfo { int lockfd; FILE * open_file; char lock_filename[256]; } lockinfo; static unsigned int num_locked_files = 0; static lockinfo locked_files[MAX_LOCKED_FILES]; FILE *fopen_and_lock(const char *path, const char *mode) { unsigned int i; int lockfd; FILE *f; #ifdef EBUG printf("\nlock() called on %s\n", path); #endif if (strlen(path) > 250) { fprintf(stderr, "Cannot open %.250s: Name too long.\n", path); return NULL; } if (num_locked_files >= MAX_LOCKED_FILES) { fprintf(stderr, "Cannot open %.250s: Too many locked files.\n", path); return NULL; } _sprintf( locked_files[num_locked_files].lock_filename, "%.250s.lck", path); for(i=0;;) { if ((lockfd = open_s(locked_files[num_locked_files].lock_filename, O_EXCL | O_CREAT, MODE)) < 0) { if (errno == EEXIST) { if (i==0) fprintf(stderr, "%.250s is locked, waiting ...\n", path); if (i<1000) i++; // slowly increase sleep time up to 1 sec Sleep(i); continue; } else { perror("Cannot open lockfile"); break; } } break; } locked_files[num_locked_files].lockfd = lockfd; if (lockfd > 0 && i > 0) { printf("Locked %.250s\n", path); } f = _fopen(path, mode); if (f) { locked_files[num_locked_files++].open_file = f; } else { if (close(locked_files[num_locked_files].lockfd) != 0) perror("Failed to close lockfile"); if (remove(locked_files[num_locked_files].lock_filename)!= 0) perror("Failed to delete lockfile"); } #ifdef EBUG printf("successfully locked %s\n", path); #endif #ifdef TEST while(1); #endif return f; } int unlock_and_fclose(FILE *f) { unsigned int i, j; int ret = 0; #ifdef EBUG printf("unlock() called\n"); #endif if (f == NULL) return -1; for (i=0; i<num_locked_files; i++) { if (locked_files[i].open_file == f) { ret = fclose(f); f = NULL; if (close(locked_files[i].lockfd) != 0) perror("Failed to close lockfile"); if (remove(locked_files[i].lock_filename)!= 0) perror("Failed to delete lockfile"); for (j=i+1; j<num_locked_files; j++) { locked_files[j-1].lockfd = locked_files[j].lockfd; locked_files[j-1].open_file = locked_files[j].open_file; _strcpy(locked_files[j-1].lock_filename, locked_files[j].lock_filename); } num_locked_files--; break; } } if (f) { fprintf(stderr, "File was not locked!\n"); ret = fclose(f); } #ifdef EBUG printf("successfully unlocked\n"); #endif return ret; } [/code] [code]//[URL="http://sourceforge.net/p/cudalucas/code/35/tree/trunk/CUDALucas.cu?force=True"]CUDALucas.cu[/URL], near the bottom of check() (line ~1400?) gettimeofday (&time1, NULL); FILE* fp = fopen_and_lock(RESULTSFILE, "a"); if(!fp) { fprintf (stderr, "Cannot write results to %s\n\n", RESULTSFILE); exit (1); } printbits (x, q, n, b, c, high, low, 64, fp, 0); if( total_time >= 0 ) { /* Only print time if we don't have an old checkpoint file */ total_time += (time1.tv_sec - start_time); printf (", estimated total time = "); print_time_from_seconds(total_time); } if( AID[0] && strncasecmp(AID, "N/A", 3) ) { // If (AID is not null), AND (AID is NOT "N/A") (case insensitive) fprintf(fp, ", AID: %s\n", AID); } else { fprintf(fp, "\n"); } unlock_and_fclose(fp); fflush (stdout); rm_checkpoint (q); [/code] I can't create Windows executables, and I don't know much about MSVS; you'll have to tell flash how to compile it with debugging symbols in Windows. Edit: Flash! The fix to open_s() never made it into r35! (It should return file_handle, not 0!) |
| All times are UTC. The time now is 23:16. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.