mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

Batalov 2012-07-01 11:28

Regardless of what you are debugging, this line is up to no good:
[CODE] if( AID[0] && strncasecmp(AID, "N/A", 3) ) { // If (AID is not null), AND (AID is NOT "N/A") (case insensitive)
[/CODE]

flashjh 2012-07-01 13:55

[QUOTE=flashjh;303777]EDIT: When quoting the immediately previous message, including the whole thing is unnecessary and only makes thread hard to read. SB[/QUOTE]
I understand, but when posting from a mobile phone, it's virtually impossible to edit previous material - the site is not moibile device friendly. As such, when the posts are long, I won't quote anymore from my phone.
[QUOTE]Yes, I see. I'll fix and repost everything in the morning.[/QUOTE]
The mentioned fix is posted.

Dubslow 2012-07-01 19:08

[URL="http://www.mersenneforum.org/showpost.php?p=303740&postcount=1448"]This[/URL] is from my phone, which is why the PS doesn't quote the mentioned post :razz:


[QUOTE=Batalov;303793]Regardless of what you are debugging, this line is up to no good:
[CODE] if( AID[0] && strncasecmp(AID, "N/A", 3) ) { // If (AID is not null), AND (AID is NOT "N/A") (case insensitive)
[/CODE][/QUOTE]
How so? Do you perhaps mean that AID[0] might not be initialized? While it isn't initialized in the declaration, its first use is in get_next_assignment(), which does this "if (key!=NULL)strcopy(*key, assignment.hex_key, MAX_LINE_LENGTH+1);", where the struct char* is initialized in parse_worktodo_line() to 0: "assignment->hex_key[0] = 0;". Hardly clear :razz: I'll be sure to fix that and just initialize to null at declaration time.

I'm not sure what else it could be, except perhaps reverse AID and "N/A"?

Batalov 2012-07-01 21:37

If AID and AID[0] are initialized, then this check is useless because strncasecmp works fine with empty strings, and if AID isn't then you are crashing right away.

Writing a comment that simply repeats the content of the line (and contradicts what is actually written, in this case) is bad.

Thinking that just because it (supposedly) doesn't crash (because today you think that you know exactly what is called and when) it is an acceptable code is worse.

Dubslow 2012-07-01 21:49

But I do need to check if it's empty. And as I explained before, the string is always properly initialized (though this isn't clear in the slightest by looking at the code, I admit, and it will be fixed).

If it is empty, yes strncasecmp will handle it fine, but it will compare empty to "N/A" as unequal. Thus just "if(strncasecmp(AID, "N/A", 3))" will end up taking the branch, which I don't want to do with an empty string. That would create "CUDALucas.... AID:" which is silly. If there's no AID, then don't needlessly print "AID:".

Thus the comment is exactly what the code does. Perhaps that's not great practice, but I did it for anyone who's not off-handedly familiar with strncasecmp, which until this discussion, I wasn't.
Edit: Perhaps I meant "if (AID is not EMPTY)", is that what you meant about incorrect comment?

I think it's acceptable code because it's the simplest way to make the program do what I want it to do without causing errors.

And as far as "you think that you know exactly what is called and when", isn't that the whole point of *procedural* programming?

PS [URL=http://sourceforge.net/p/cudalucas/code/37/tree/trunk/CUDALucas.cu?diff=4fd01c250594ca22d60001b3:36]I've committed the changes.[/URL]

flashjh 2012-07-01 23:17

2.04 Beta x64 binaries updated [URL="https://sourceforge.net/projects/cudalucas/files/2.04%20Beta/"]here[/URL]

Batalov 2012-07-02 01:31

[QUOTE=Dubslow;303821]Edit: Perhaps I meant "if (AID is not EMPTY)", is that what you meant about incorrect comment?
[/QUOTE]
Yes, ok, "EMPTY" is the word I can live with, -- not null!

...but even with "EMPTY" this piece of code sounds like
[CODE] if(2+3==5) { // if two plus three equals five, then[/CODE]

Comments are not for this. Comments should be like
[CODE] dump_interval = MAX(dump_interval,
DEFAULT_DUMP_INTERVAL + 1);

/* make the dump interval a multiple of
the check interval. If this is not done,
eventually we will perform a check and
then less than three iterations later
will get a dump, which performs another
check. The Lanczos recurrence only
guarantees that check vectors more than
three iterations back will be orthogonal
to the current x, so this will cause
spurious failures */

dump_interval += check_interval -
dump_interval %
check_interval;
[/CODE]

and not like
[CODE] dump_interval = MAX(dump_interval,
DEFAULT_DUMP_INTERVAL + 1);
// make dump_interval at least DEFAULT_DUMP_INTERVAL + 1

dump_interval += check_interval -
dump_interval %
check_interval;
// add check_interval - dump_interval modulo check_interval to dump_interval
[/CODE]
Comments are for briefly explaining the reasoning -- and [U]not[/U] for retelling the code once again in human words.

That line needs no comment, but rather a preceding line
[CODE]assert(AID);
#or, if you prefer
assert(AID != NULL);[/CODE]
Then your conscience is clean and your function can be called in any fashion. (You never program alone. Someone will call your function after the list was all consumed and possibly freed. Or will put it in some callback. In short, you never know.)

LaurV 2012-07-02 17:00

[QUOTE=Dubslow;303785]You don't get any error messages either, like "Failed to close lockfile" in the code?

And it doesn't crash, but just fails to delete the lock file?[/QUOTE]
No idea. Right now I have no CL work. I started to TF the 332M range to 70, as per discussion in the gpu272 thread, with all the steam, but I had to stop that one too, as I temporarily need the cards for daily work. Like in the interrupt mode, the higher priority interrupt wins. When this is done I will return to mfaktc and 332M5 (already finished the range to 332M4 inclusive, before interruption) which will be done in few days too. Then back to CL-DC. I'll watch for this error and keep you informed if it pops up (I may be that I missed it in the past, as I said, I used to delete the lock file by hand anytime I saw it and that may be a reason why no error like failed to close/delete files). But now I will watch for it.

apsen 2012-07-03 02:12

Crashed again. Looks like it's 100% reproducible.

apsen 2012-07-03 03:55

Guys, do you close the lock file before trying to delete it?

I didn't look at the code but I think you are forgetting to close the file and then do something uncosher when you try to create it again.

Dubslow 2012-07-03 04:39

[QUOTE=apsen;303913]Guys, do you close the lock file before trying to delete it?

I didn't look at the code but I think you are forgetting to close the file and then do something uncosher when you try to create it again.[/QUOTE]

Like I posted above:
[code] //[URL="http://sourceforge.net/p/cudalucas/code/35/tree/trunk/parse.c"]parse.c[/URL], line 86
#include <winsock2.h>

#include <io.h>

#include <share.h> //used for _sopen_s

#undef close

#define close _close

#define sched_yield SwitchToThread

#define MODE _S_IREAD | _S_IWRITE

#define strncasecmp _strnicmp



/* Everything from here to the next include is to make MSVS happy. */

#define sscanf sscanf_s /* This only works for scanning numbers, or strings with a defined length (e.g. "%131s") */



void strcopy(char* dest, char* src, size_t n)

{

strncpy_s(dest, MAX_LINE_LENGTH+1, src, n);

}

FILE* _fopen(const char* path, const char* mode)

{

FILE* stream;

errno_t err = fopen_s(&stream, path, mode);

if(err) return NULL;

else return stream;

}

void _sprintf(char* buf, char* frmt, const char* string)

{ // only used in filelocking code

sprintf_s(buf, 251, frmt, string);

}

int open_s(const char *filename, int oflag, int pmode)

{

int file_handle;

errno_t err = _sopen_s( &file_handle, filename, oflag, _SH_DENYNO, pmode);

if (err)

{

close (file_handle);

return -1;

}

else return 0;

}

void _strcpy(char *dest, const char *src)

{

strcpy_s (dest, _countof(dest), src);

}
[/code]
[code]//[URL="http://sourceforge.net/p/cudalucas/code/35/tree/trunk/parse.c"]parse.c[/URL] line 728
/*****************************************************************************/

/* mfakto's file locking code */



#define MAX_LOCKED_FILES 3



typedef struct _lockinfo

{

int lockfd;

FILE * open_file;

char lock_filename[256];

} lockinfo;



static unsigned int num_locked_files = 0;

static lockinfo locked_files[MAX_LOCKED_FILES];

int unlock_and_fclose(FILE *f)

{

unsigned int i, j;

int ret = 0;

#ifdef EBUG

printf("unlock() called\n");

#endif



if (f == NULL) return -1;



for (i=0; i<num_locked_files; i++)

{

if (locked_files[i].open_file == f)

{

ret = fclose(f);

f = NULL;

[B] if (close(locked_files[i].lockfd) != 0) perror("Failed to close lockfile");

if (remove(locked_files[i].lock_filename)!= 0) perror("Failed to delete lockfile");[/B]

for (j=i+1; j<num_locked_files; j++)

{

locked_files[j-1].lockfd = locked_files[j].lockfd;

locked_files[j-1].open_file = locked_files[j].open_file;

_strcpy(locked_files[j-1].lock_filename, locked_files[j].lock_filename);

}

num_locked_files--;

break;

}

}

if (f)

{

fprintf(stderr, "File was not locked!\n");

ret = fclose(f);

}

#ifdef EBUG

printf("successfully unlocked\n");

#endif

return ret;

}
[/code]
I know that flash changed it from what Bdot (the original developer of this code) had to make MSVS warnings go away. The most likely thing, it seems to me, is that somehow the changes broke it. I'll take a look through MSFT's library and see if I can spot something going wrong somewhere.

flashjh 2012-07-04 03:42

To eliminate code change problems, if we can get a set of code without the changes to make MSVS happy, I can get it to compile so we can test the original code.

Dubslow 2012-07-04 04:28

[QUOTE=flashjh;303999]To eliminate code change problems, if we can get a set of code without the changes to make MSVS happy, I can get it to compile so we can test the original code.[/QUOTE]
[url]https://github.com/Bdot42/mfakto/blob/master/src/filelocking.c[/url]

Alternately, try checking out r32, my initial Beta commit.
svn checkout svn://svn.code.sf.net/p/cudalucas/code/trunk -r 32

Bdot 2012-07-04 08:09

[QUOTE=apsen;303910]Crashed again. Looks like it's 100% reproducible.[/QUOTE]

If it is reproducible, could you attach a debugger to it before it crashes? Or start it right out of a debugger? A stacktrace would certainly be of great help. If you can get the debug symbols for your build, then you can even look at the data / variables ...

flashjh 2012-07-04 13:58

I [URL="https://sourceforge.net/projects/cudalucas/files/2.04%20Beta/"]uploaded[/URL] a 3.2 x64 version with dubug info (hopefully). If someone could test the debug and let me know, I would appreciate it.

Bdot 2012-07-04 14:40

[QUOTE=flashjh;304029]I [URL="https://sourceforge.net/projects/cudalucas/files/2.04%20Beta/"]uploaded[/URL] a 3.2 x64 version with dubug info (hopefully). If someone could test the debug and let me know, I would appreciate it.[/QUOTE]
I can try.

apsen, can you give me the command line you used and your .ini / worktodo contents (if any)?

Bdot 2012-07-04 15:45

[QUOTE=Bdot;304032]I can try.

apsen, can you give me the command line you used and your .ini / worktodo contents (if any)?[/QUOTE]

I can reproduce the abort - no need to send any ini/worktodo files.

The "debug" binary, unfortunately, contains no debug information that any of my debuggers could identify.

I tried downloading the source, but my svn client fails to pass the firewall for svn:// links, only http[s] works (via proxy). However, for all http requests to SF, all I get is "200 OK" instead of any file ...

flashjh 2012-07-04 15:59

[QUOTE=Bdot;304033]I can reproduce the abort - no need to send any ini/worktodo files.

The "debug" binary, unfortunately, contains no debug information that any of my debuggers could identify.

I tried downloading the source, but my svn client fails to pass the firewall for svn:// links, only http[s] works (via proxy). However, for all http requests to SF, all I get is "200 OK" instead of any file ...[/QUOTE]
When you build, what do you use to include debugging info? I use MSVS command line, so the /DEBUG doesn't work.

I've been trying to move the projects to the full MSVS, but I haven't spent a lot of time getting it to work yet.

flashjh 2012-07-04 16:07

I posted a new version I compiled with /Zi for cl.exe. I'm using [URL="http://msdn.microsoft.com/en-us/library/19z1t1wy(v=vs.90).aspx"]this page [/URL]for the examples. Let me know if that one has debugging info.

I found that the original debugging info I incuded is for [URL="http://developer.nvidia.com/cuda-gdb"]CUDA-GDB[/URL] (and it's still there for now).

Bdot 2012-07-04 16:09

[QUOTE=flashjh;304034]When you build, what do you use to include debugging info? I use MSVS command line, so the /DEBUG doesn't work.

I've been trying to move the projects to the full MSVS, but I haven't spent a lot of time getting it to work yet.[/QUOTE]

I usually do
[code]
... /DEBUG /PDB "outfile.pdb" ...
[/code]

This way you have a small exe file you can send around, and if you need the debug info, just send the matching pdb along ... Hmm, I notice, I did not do that for mfakto yet ...

flashjh 2012-07-04 16:32

[QUOTE=Bdot;304037]I usually do
[code]
... /DEBUG /PDB "outfile.pdb" ...
[/code][/QUOTE]

Do you use /DEBUG with the linker or cl? For me, when I use /DEBUG with cl, it won't even compile.

I just used /DEBUG with the linker and it made the .pdb file, they are uploaded to the [URL="https://sourceforge.net/projects/cudalucas/files/2.04%20Beta/?"]2.04 beta folder[/URL].

Dubslow 2012-07-04 18:33

Real quick, are all these new versions with Bdot's unmodified code?

And is not /DEBUG merely defining "EBUG" for the preprocessor, not actually including debugging symbols? I will say that most of your conversation is gibberish to me. :smile:

Dubslow 2012-07-04 19:49

PS Even if Bdot's original code also doesn't work, I still suspect it's a MSFT library/API function problem, because his Linux code is working fine for me.

Bdot 2012-07-04 19:54

[QUOTE=Dubslow;304052]Real quick, are all these new versions with Bdot's unmodified code?

And is not /DEBUG merely defining "EBUG" for the preprocessor, not actually including debugging symbols? I will say that most of your conversation is gibberish to me. :smile:[/QUOTE]

:wink:
these are options to the linker, otherwise ... you're right.

Is that the reason for the new lines like these in filelocking.c?
[code]
#ifdef EBUG
...
[/code]

I'll test the debug symbols tomorrow.

Dubslow 2012-07-04 21:36

Whenever I need to conditionally insert print statements, I use EBUG, because then I can do '-DEBUG'. :smile:

In this case, that's leftover from when I initially added the file lock code; I added those print statements just to convince myself I was actually calling the right functions at the right time. I didn't actually do any debugging, because everything appeared to be working.

flashjh 2012-07-05 03:44

[QUOTE=Dubslow;304057]PS Even if Bdot's original code also doesn't work, I still suspect it's a MSFT library/API function problem, because his Linux code is working fine for me.[/QUOTE]
Ok, I got home today and CL finally produced the same error. Strange thing is I've done 2 other exponents with no problems. I'll do some troubleshooting to see what I can find. The debugging isn't working for MSVS because it's compiled with the command prompt. Tomorrow I'm going to see if I can get CL to compile in MSVS. If anyone can use the debgging info included in the debug version, let me know.

apsen 2012-07-05 12:47

[QUOTE=Bdot;304013] A stacktrace would certainly be of great help. If you can get the debug symbols for your build, then you can even look at the data / variables ...[/QUOTE]

Stack trace wasn't helpful at all and I have no debug info :-(

apsen 2012-07-05 12:54

[QUOTE=flashjh;304029]I [URL="https://sourceforge.net/projects/cudalucas/files/2.04%20Beta/"]uploaded[/URL] a 3.2 x64 version with dubug info (hopefully). If someone could test the debug and let me know, I would appreciate it.[/QUOTE]

That one didn't crash.

And I lost my saved checkpoints so it will take a little bit until I get another set.

apsen 2012-07-05 13:02

[QUOTE=flashjh;304064]Ok, I got home today and CL finally produced the same error. Strange thing is I've done 2 other exponents with no problems. I'll do some troubleshooting to see what I can find. The debugging isn't working for MSVS because it's compiled with the command prompt. Tomorrow I'm going to see if I can get CL to compile in MSVS. If anyone can use the debgging info included in the debug version, let me know.[/QUOTE]

I've compiled the version I got from SF in GUI MSVS but that one behaves with smallest exponent. Didn't have a chance to test with the other ones. Also I'm not sure which version is used to compile the crashing executable...

flashjh 2012-07-05 14:22

[QUOTE=apsen;304077]I've compiled the version I got from SF in GUI MSVS but that one behaves with smallest exponent. Didn't have a chance to test with the other ones. Also I'm not sure which version is used to compile the crashing executable...[/QUOTE]

What settings do you use in MSVS?

The debug version is the latest revision.

apsen 2012-07-05 19:16

[QUOTE=flashjh;304082]What settings do you use in MSVS?

The debug version is the latest revision.[/QUOTE]

I could share MSVS 2010 project file...


I think i've managed to crash that one too this morning will check more tonight.

flashjh 2012-07-05 21:56

[QUOTE=apsen;304095]I could share MSVS 2010 project file...
[/QUOTE]
That would be perfect, let me know if you can post it for download or PM me for email.

LaurV 2012-07-08 15:38

[QUOTE=Dubslow;303785]You don't get any error messages either, like "Failed to close lockfile" in the code?

And it doesn't crash, but just fails to delete the lock file?[/QUOTE]
As I said, this is what I usually get (see below, I had to stop it some time before finish and start it from a batch to be able to capture the output, otherwise if it crash the screen is gone**), and the file will be manually deleted as I check the progress every few hours, so when the next expo finishes there was no file anymore. He still creates a new one and forgets to delete it, which will be deleted by hand next time when I check, and so on. But there is no crash of CL. I won't risk to let the file there till he finish next expo. What for?

[CODE]
\CL1>cl204b4020x64

mkdir: cannot create directory `backup1': File exists
Continuing work from a partial result of M26497649 fft length = 1440K iteration = 25924502
Iteration 26000000 M( 26497649 )C, 0x147ebfbc35ee7028, n = 1440K, CUDALucas v2.04 Beta err = 0.1602 (3:58 real, 2.3764 ms/iter, ETA 15:50)
Iteration 26100000 M( 26497649 )C, 0x29fb299b35e9cbe6, n = 1440K, CUDALucas v2.04 Beta err = 0.1514 (5:24 real, 3.2440 ms/iter, ETA 16:13)
Iteration 26200000 M( 26497649 )C, 0x24f8055de4892515, n = 1440K, CUDALucas v2.04 Beta err = 0.1484 (5:15 real, 3.1472 ms/iter, ETA 10:29)
Iteration 26300000 M( 26497649 )C, 0xefd050dabce73911, n = 1440K, CUDALucas v2.04 Beta err = 0.1465 (5:15 real, 3.1483 ms/iter, ETA 5:14)
Iteration 26400000 M( 26497649 )C, 0x561f30c88f709ac5, n = 1440K, CUDALucas v2.04 Beta err = 0.1465 (5:15 real, 3.1455 ms/iter, ETA 0:00)
M( 26497649 )C, 0x3640c1d7bbccd206, n = 1440K, CUDALucas v2.04 Beta, estimated total time = 20:09:02Failed to delete lockfile: Permission denied


Starting M26477161 fft length = 1440K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration 100, average error = 0.08310, max error = 0.11816
Iteration 200, average error = 0.09513, max error = 0.11523
Iteration 300, average error = 0.10003, max error = 0.12500
Iteration 400, average error = 0.10162, max error = 0.11523
Iteration 500, average error = 0.10283, max error = 0.12500
Iteration 600, average error = 0.10387, max error = 0.11719
Iteration 700, average error = 0.10443, max error = 0.11877
Iteration 800, average error = 0.10473, max error = 0.11719
Iteration 900, average error = 0.10501, max error = 0.11963
Iteration 1000, average error = 0.10556 < 0.25 (max error = 0.11719), continuing test.
Iteration 100000 M( 26477161 )C, 0x29c5c3235f93ea86, n = 1440K, CUDALucas v2.04 Beta err = 0.1406 (5:15 real, 3.1537 ms/iter, ETA 23:02:23)
Iteration 200000 M( 26477161 )C, 0x6ba92ed1ae662307, n = 1440K, CUDALucas v2.04 Beta err = 0.1416 (5:15 real, 3.1455 ms/iter, ETA 22:53:31)
Iteration 300000 M( 26477161 )C, 0x37767e5f4c3e32a6, n = 1440K, CUDALucas v2.04 Beta err = 0.1641 (5:15 real, 3.1472 ms/iter, ETA 22:49:02)
Iteration 400000 M( 26477161 )C, 0x32b9c11d199d4789, n = 1440K, CUDALucas v2.04 Beta err = 0.1465 (5:14 real, 3.1461 ms/iter, ETA 22:43:17)
Iteration 500000 M( 26477161 )C, 0x37b6870dea4c136c, n = 1440K, CUDALucas v2.04 Beta err = 0.1406 (5:15 real, 3.1434 ms/iter, ETA 22:36:53)
Iteration 600000 M( 26477161 )C, 0x3e02bb220e7bcfea, n = 1440K, CUDALucas v2.04 Beta err = 0.1563 (5:15 real, 3.1491 ms/iter, ETA 22:34:06)
Iteration 700000 M( 26477161 )C, 0x4e92011453afc83b, n = 1440K, CUDALucas v2.04 Beta err = 0.1436 (5:14 real, 3.1463 ms/iter, ETA 22:27:40)
SIGINT caught, writing checkpoint. Estimated time spent so far: 40:31

\CL1>
[/CODE]

**Note: log files? when?

Dubslow 2012-07-08 19:35

So CUDALucas fails to delete the log file, but doesn't crash.

As for logging, I'll add that in 2.05, but in the meantime you can redirect the output to a file.

apsen 2012-07-09 17:32

[QUOTE=Dubslow;304297]So CUDALucas fails to delete the log file, but doesn't crash.[/QUOTE]

It will crash when it will be finishing next exponent and the lock file still exist.

apsen 2012-07-09 17:35

[QUOTE=flashjh;304105]That would be perfect, let me know if you can post it for download or PM me for email.[/QUOTE]

Sorry I've been away for the weekend. I'll try to get to it today.

apsen 2012-07-10 14:12

1 Attachment(s)
[QUOTE=flashjh;304105]That would be perfect, let me know if you can post it for download or PM me for email.[/QUOTE]

Here it is:

apsen 2012-07-10 14:22

In open_s if the opening of the file is not successful it is being closed and the program crashed when it tries to close file handle -1.

I still haven't found why the lock file is not deleted as when I compile it myself I do not see that behavior.

LaurV 2012-07-21 15:55

That is related to the discussion started in the "332M" thread, about the maximum exponent and FFT size that CL is able to handle.

[CODE]e:\-99-Prime\CudaLucas\CL0>cl204b4020x64.exe

mkdir: cannot create directory `backup0': File exists
over specifications Grid = 65536
try increasing threads (256) or decreasing FFT length (16384K)

e:\-99-Prime\CudaLucas\CL0>cl204b4020x64.exe -threads 512

mkdir: cannot create directory `backup0': File exists
Starting M332194529 fft length = 16384K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
CUDALucas.cu(159) : cufftSafeCall() CUFFT error 6: CUFFT_EXEC_FAILED

e:\-99-Prime\CudaLucas\CL0>cl204b4020x64.exe -threads 1024

mkdir: cannot create directory `backup0': File exists
Starting M332194529 fft length = 16384K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
CUDALucas.cu(159) : cufftSafeCall() CUFFT error 6: CUFFT_EXEC_FAILED

e:\-99-Prime\CudaLucas\CL0>[/CODE]

Dubslow 2012-07-21 17:11

[QUOTE=LaurV;305387]That is related to the discussion started in the "332M" thread, about the maximum exponent and FFT size that CL is able to handle.

[CODE]e:\-99-Prime\CudaLucas\CL0>cl204b4020x64.exe

mkdir: cannot create directory `backup0': File exists
over specifications Grid = 65536
try increasing threads (256) or decreasing FFT length (16384K)

e:\-99-Prime\CudaLucas\CL0>cl204b4020x64.exe -threads 512

mkdir: cannot create directory `backup0': File exists
Starting M332194529 fft length = 16384K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
CUDALucas.cu(159) : cufftSafeCall() CUFFT error 6: CUFFT_EXEC_FAILED

e:\-99-Prime\CudaLucas\CL0>cl204b4020x64.exe -threads 1024

mkdir: cannot create directory `backup0': File exists
Starting M332194529 fft length = 16384K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
CUDALucas.cu(159) : cufftSafeCall() CUFFT error 6: CUFFT_EXEC_FAILED

e:\-99-Prime\CudaLucas\CL0>[/CODE][/QUOTE]
I can't help you then, I don't actually know much of anything about the CUFFT library. Was this on a GTX 580? Do you have a 2.1 CC card?

When I tried it, I got an OOM error:
[code]bill@Gravemind:~/CUDALucas∰∂ CUDALucas -f 16384K -threads 512 332194529

Starting M332194529 fft length = 16384K
CUDALucas.cu(259) : cudaSafeCall() Runtime API error 2: out of memory.[/code]
In the surrounding posts of that discussion I [URL="http://mersenneforum.org/showthread.php?p=295750&highlight=brain#post295750"]linked[/URL], there's some discussion of how much mem is needed; my card is only a 768MB card, so that's my problem. What's yours? (Just add a -info to your options)

lycorn 2012-07-23 07:49

Hi all,

I haven´t been following this thread for a while now, so I got a bit confused about the latest available windows version of the CUDALucas program. Could someone please point me to the right .exe (to be run on a 560Ti)?
Thanks in advance.

LaurV 2012-07-23 08:52

There is a 2.03 Stable version and a (better, but still under work) 2.04 Beta version, both on the [URL="https://sourceforge.net/projects/cudalucas/files"]sourceforge[/URL] page. I personally use the Beta right now. There is no difference in math, just in "cosmetic" things, the Beta has some "improvements" which are partially working, partially are still worked on..:smile:

TObject 2012-07-24 00:04

CUDALucas 2.03

I accidently hit Ctrl-C twice on the CudaLUCAS window, as the result, when I continued the test I got the following message: “The checkpoint doesn’t match current test. Current test will be restarted.”

This is bad. It shouldn’t be that easy to lose all the work; especially when some people may be accustomed hitting Ctrl-C twice on an mfaktc window to exit immediately.

Dubslow 2012-07-24 00:39

[QUOTE=TObject;305680]CUDALucas 2.03

I accidently hit Ctrl-C twice on the CudaLUCAS window, as the result, when I continued the test I got the following message: “The checkpoint doesn’t match current test. Current test will be restarted.”

This is bad. It shouldn’t be that easy to lose all the work; especially when some people may be accustomed hitting Ctrl-C twice on an mfaktc window to exit immediately.[/QUOTE]

One of the problems (and changes in 2.04) is that the message could mean a variety of things. It could be the meta-data was corrupted, that the exponents didn't match, or (most likely, I think) that the main data was corrupted.

The ^C doesn't do anything itself except set a global quitting variable, which is in turn checked every iteration. A double ^C thus should not have had any effect, except perhaps printing the quitting message twice.

The only possible thing I could think of is that perhaps the second ^C was called while one of the various fwrite() calls was being executed, and that somehow that caused a corruption somewhere. I'll defer to more experienced programmers in that matter.

FWIW, I couldn't replicate in 2.04 Beta.
[code]Iteration 7680000 M( 26661529 )C, 0x6a13e9d50b44c72e, n = 1440K, CUDALucas v2.04 Beta err = 0.1523 (1:48 real, 5.4042 ms/iter, ETA 28:29:30)
^C SIGINT caught, writing checkpoint. Estimated time spent so far: 11:44:11

bill@Gravemind:~/CUDALucas∰∂ ^C
bill@Gravemind:~/CUDALucas∰∂ CUDALucas

Continuing work from a partial result of M26661529 fft length = 1440K iteration = 7689302
^C^C SIGINT caught, writing checkpoint. SIGINT caught, writing checkpoint. Estimated time spent so far: 11:44:11

bill@Gravemind:~/CUDALucas∰∂ CUDALucas

Continuing work from a partial result of M26661529 fft length = 1440K iteration = 7689345
Iteration 7700000 M( 26661529 )C, 0x5e6f65ddfa011c0a, n = 1440K, CUDALucas v2.04 Beta err = 0.1406 (0:59 real, 2.9549 ms/iter, ETA 15:33:44)
Iteration 7720000 M( 26661529 )C, 0x572d5b0fd4b87e69, n = 1440K, CUDALucas v2.04 Beta err = 0.1523 (1:52 real, 5.5877 ms/iter, ETA 29:23:50)
Iteration 7740000 M( 26661529 )C, 0xa9f5f7180a3fd8c2, n = 1440K, CUDALucas v2.04 Beta err = 0.1543 (1:50 real, 5.4870 ms/iter, ETA 28:50:14)
Iteration 7760000 M( 26661529 )C, 0x65353774d697b137, n = 1440K, CUDALucas v2.04 Beta err = 0.1453 (1:49 real, 5.4559 ms/iter, ETA 28:38:36)
Iteration 7780000 M( 26661529 )C, 0x474870feb62f6ea0, n = 1440K, CUDALucas v2.04 Beta err = 0.1504 (1:49 real, 5.4690 ms/iter, ETA 28:40:55)
Iteration 7800000 M( 26661529 )C, 0x00e7204a64ae247d, n = 1440K, CUDALucas v2.04 Beta err = 0.1484 (1:50 real, 5.4655 ms/iter, ETA 28:37:59)
^C^C SIGINT caught, writing checkpoint. SIGINT caught, writing checkpoint. Estimated time spent so far: 11:55:57

bill@Gravemind:~/CUDALucas∰∂ CUDALucas

Continuing work from a partial result of M26661529 fft length = 1440K iteration = 7817985[/code]

TObject 2012-07-24 01:00

Thank you for the explanation. Hopefully 2.04 already fixed it. With 2.03 I can reliably duplicate the issue: every time I press Ctrl-C twice in quick succession, this error pops up on the next start.

TObject 2012-07-24 01:10

1 Attachment(s)
I upgraded to the beta version CUDALucas-2.04 Beta-4.1-sm_21-x64.exe and I confirm that the error I reported in the post [url=http://www.mersenneforum.org/showpost.php?p=305680&postcount=1498]1498[/url] has been fixed.

Thank you.

Edit: I spoke too soon. The error is still there, although it took a few tries to replicate it with 2.04.

TObject 2012-07-24 01:42

The error message in 2.04 is “The checkpoint appears to be corrupt. Current test will be restarted.”

A few thoughts:
a) Obviously, it would be nice if the corruption did not happen in the first place.
b) Do not overwrite the backup save file until it is determined that the main file is in good shape.
c) Instead of restarting the test from the beginning attempt to restart it from the backup save file.
d) Consider asking “Do you want to restart the test?” rather than restarting automatically. Some people may have ability to restore save files from backup or restore points; so they would answer no, and go looking for a good version of the save file before newer versions are piled on top.

These are just friendly suggestions. Not complaints. Thank you for your hard work on the application.

Dubslow 2012-07-24 01:51

Hmm... I've no idea what might be causing it. Any experts want to weigh in?

a) Well yeah :smile:
b) You mean when writing checkpoints?
c) Already on the todo list for 2.05
d) Anyone with that ability can restore the checkpoint regardless of whether or not the test is restarted, just delete/overwrite the new/short/restarted checkpoint.

TObject 2012-07-24 02:08

[QUOTE=Dubslow;305696]
b) You mean when writing checkpoints?
[/QUOTE]

Every time the backup save file is overwritten, if it does not cause too much of the performance hit. The idea is to have something to prevent a good backup save file from being overwritten by a corrupted save file.

Dubslow 2012-07-24 02:23

[QUOTE=TObject;305699]Every time the backup save file is overwritten, if it does not cause too much of the performance hit. The idea is to have something to prevent a good backup save file from being overwritten by a corrupted save file.[/QUOTE]

Here's the current pseudo-[URL="http://sourceforge.net/p/cudalucas/code/37/tree/trunk/CUDALucas.cu?force=True"]code[/URL] (Ctrl+F "write_checkpoint" if you're curious):
[code]delete t-checkpoint
move c-ckp to t-ckp
write current data to c-ckp[/code]
What about that would you change?

kladner 2012-07-24 02:46

I just got CL up and running by itself on the GTX 460. With "SaveAllCheckpoints=1" set, I already have 143 saved checkpoiints in 3h27m. Obviously, I need to slow the write timing way down.

I say this to ask, do you have this set? Did the error wreck all checkpoints? Does it happen before more than one is created?

Sorry if this has already been addressed. I don't see it in the previous posts (or didn't understand.) I agree that the error should not be happening. But there is a multi-backup function available which would greatly mitigate its effects until the coding gods figure out what's going on.:smile:

Dubslow 2012-07-24 02:48

[QUOTE=kladner;305707]until the coding gods figure out what's going on.:smile:[/QUOTE]
I for one welcome any advice they might have. :razz:

Yes though, as kladner points out, SaveAllCheckpoints is a decent workaround.

LaurV 2012-07-24 03:06

[QUOTE=kladner;305707] Obviously, I need to slow the write timing way down.[/QUOTE]
Use the -c switch with a higher figure (I use 100k, 400k, etc). Writing on files goes together with writing on screen.

About the ctrl-C problem, it is not a bug, but how windows works. Ctrl-C was the "break" command in the old DOS times, used to forcibly terminate tasks. Double Ctrl-C (i.e. pressing break during the break interrupt is served) is the "CTRL-Break", which is the same as aborting the process from task-manager. If this occurs when CL writes the file (high possibility! files are big and are written immediately when you press the first ctrl-c), then the file is gone. You have to delete the cxxxxxx file and keep only the txxxxxx and resume from one step behind. You must be really unlucky to destroy both files, but this is still possible, without special precautions in the software. Don't double press ctrl-c. Press it once and be patient. :smile:

And use backups, as klander said. Disasters happen... better be protected.

TObject 2012-07-24 03:17

[QUOTE=Dubslow;305702]Here's the current pseudo-[URL="http://sourceforge.net/p/cudalucas/code/37/tree/trunk/CUDALucas.cu?force=True"]code[/URL] (Ctrl+F "write_checkpoint" if you're curious):
[code]delete t-checkpoint
move c-ckp to t-ckp
write current data to c-ckp[/code]
What about that would you change?[/QUOTE]

[code]if IsValidSaveFile(c-ckp) then
begin
delete t-checkpoint
move c-ckp to t-ckp
write current data to c-ckp
end
else deal with it...[/code]

Dubslow 2012-07-24 03:23

So check to see if the last checkpoint is good? Why should it do that? That wouldn't help this problem in any way... (I don't think...)

TObject 2012-07-24 03:29

Well, right now the t-file is useless since it is being overwritten by the corrupted c-file.

Dubslow 2012-07-24 03:40

[QUOTE=TObject;305713]Well, right now the t-file is useless since it is being overwritten by the corrupted c-file.[/QUOTE]

I think you misunderstand...

Let's think this way: A is the data to be written for the quit. B is the last checkpoint, and C is the checkpoint before that, which starts as t-ckp.


First, t-ckp is deleted, so the C data is gone.

Then, the B data is moved from c-ckp to t-ckp.

Then, the current/A data is written to c-ckp.

Thus, even if the fwrite()s get interrupted as LaurV described, the B data can still be located in the t-ckp.

I think what would be the better feature, is, as we've talked about, try the backup t-ckp (which would contain the B data) before aborting due to corruption (of the A data). If you guys think that's critical, I can wedge it into a second Beta release once the file locking issue is fixed. (It should be fairly easy to mess with the logic, not prone to bug introduction.)

TObject 2012-07-24 03:43

It is not critical. Getting rid of the mfaktc-formed habit of hitting Ctrl-C twice to stop the test eliminates the issue.

I have daily backups, so I loose 24 hours worth of work at most. I can also use kladner's suggestion.

TObject 2012-07-24 04:14

[QUOTE=Dubslow;305715]I think you misunderstand...[/QUOTE]

It does not really matter if we understand each other on how exactly it happens. All I am saying, if checking for corrupted data can happen relatively fast, it would be a good idea to do that before erasing the last good t-ckp.

Bdot 2012-07-24 22:57

[QUOTE=Dubslow;305708]I for one welcome any advice they might have. :razz:
[/QUOTE]

Is CL re-inserting the signal handler for ^C once it has been hit?

Windows has the nasty habit to discard the signal handler when it was invoked once. Therefore, the first thing mfaktc's signal handler does on Windows is to re-register itself.

In addition, to make the checkpoint-writing signal-proof, you'd need to enclose each write in a loop and keep trying until the desired number of bytes is written.

Or, disable signal-delivery while writing a checkpoint ... hmmm, not sure if windows knows about sigprocmask.

kladner 2012-07-25 02:15

[QUOTE=kladner;305707]But there is a multi-backup function available.....[/QUOTE]

I just experienced the error you report, TObject. I believe it happened as part of a BSOD, which in turn I think resulted from near-brownout conditions combined with a PSU running near its limit.

In any case, I got the same message about corrupted check files. I then realized that I didn't exactly know how to use the backups. With a little study I decided that the obvious thing was to deleted the tiny corrupt files and rename the last save file from s##### to c##### and delete the period and everything after it. This seems to have restored the run to a place just a few minutes before the crash.

Dubslow 2012-07-25 02:23

[QUOTE=Bdot;305826]In addition, to make the checkpoint-writing signal-proof, you'd need to enclose each write in a loop and keep trying until the desired number of bytes is written.

Or, disable signal-delivery while writing a checkpoint ... hmmm, not sure if windows knows about sigprocmask.[/QUOTE]
Yech. Too darn hard, too much work for such a stupid problem.

[QUOTE=Bdot;305826]Is CL re-inserting the signal handler for ^C once it has been hit?

Windows has the nasty habit to discard the signal handler when it was invoked once. Therefore, the first thing mfaktc's signal handler does on Windows is to re-register itself.[/quote]Discards the signal handler? Why in the... so the first line of code in the handler should be to install itself again? Hmm... stupid WinBloze. I'll add it in.

How's the file (un)locking coming along?

Bdot 2012-07-25 06:50

[QUOTE=Dubslow;305869]
How's the file (un)locking coming along?[/QUOTE]

Oops, I did not recognize you're waiting for me ...

As the only NV card where I can try CL is in my workstation at work, I could not investigate any further. At the moment there is really no time at all. Sorry for that.

Dubslow 2012-07-25 07:01

[QUOTE=Bdot;305895]Oops, I did not recognize you're waiting for me ...

As the only NV card where I can try CL is in my workstation at work, I could not investigate any further. At the moment there is really no time at all. Sorry for that.[/QUOTE]

Ah, sorry about the miscommunication :smile:. flash, have you been able to get anywhere on that? (My problem is that I'm just about the only person besides msft who doesn't use Windows...)

lycorn 2012-07-26 13:02

[QUOTE=LaurV;305573]There is a 2.03 Stable version and a (better, but still under work) 2.04 Beta version, both on the [URL="https://sourceforge.net/projects/cudalucas/files"]sourceforge[/URL] page. I personally use the Beta right now. There is no difference in math, just in "cosmetic" things, the Beta has some "improvements" which are partially working, partially are still worked on..:smile:[/QUOTE]

Thanks, LaurV.
I gave it a go, but unfortunately got a residue mismatch. As it was the second time in two tests (the first one ran a couple of months ago), I´ll keep on using the card for TF only.

NormanRKN 2012-07-26 21:25

i´ve read not all posts.
is there a possibility for a SP version instead of a DP ?
most nvidia are very very ugly in performance VS. a ati/amd gpu.
this means a waste of time and ERNERGY.
my nvidias all the time needs very very very long and for a ati/amd is it a warp ;)
for me is nvidia vs. ati in DP like angel and devil.
please not crunching DP on a Nvidia :no:. only waste (most).
if i see a midrange ati vs. a highend nvidia crunching in DP mode .. puuuh.. forget nvidia ;)
SP is there work (excluded (maybe) the teslas with something more DP ;) )

p.e. in wilkyway@home is a ati 7x faster then a nvidia

Norman

Dubslow 2012-07-26 22:03

I don't know about SP FFTs, but somehow I don't think it's possible. (I'm not the person who does the math of CUDALucas.)

The reason an LL program hasn't been implemented on OpenCL/AMD cards is because there is no OpenCL FFT library, whereas nVidia provides a cufftlib for CUDA cards.

LaurV 2012-07-27 02:18

SP FFT is not possible, there is not enough room for the carry. Search the forum, there have been a lot of discussions. (If I find some time today I will come back with some link).

Talking amd against nvidia is only a gamer subject. For games, amd cards may be "better" or "faster" and they are "cheaper". But think about that "cheaper" part: generally, you get what you pay for. For "real" [strike]things[/strike]race they don't even see the tail of the nvidias... By the way, do you understand why DP is in general 4-6-8 times slower then SP? Compare filling a barrel with a bucket, and filling it with a cup. With the bucket you will move slower, need more time to fill the bucket, carry it, pour it, etc, but at the end, you will fill the barrel much faster with the bucket than with the cup. Try it.

And see [URL="http://mersenne-aries.sili.net/mfaktc.php?sort=ghdpd"]James' page[/URL] for "real" speed comparison (edit: in Firefox you have to scroll down, that page has a problem when displayed in Firefox if the window is too narrow). Those results don't use any FFT, so not the FFT is the problem. Ati/amd do some [B]inaccurate[/B] "video tricks" faster. But when it come to (accurate) general processing, they are much slower. There is a big difference between video cards and GPU's. You can - at most - compare ati/amd cards with the last nvidia fiasco: 6xx series, as all of such cards are "video cards", fast for games, lousy for general computing (GPGPU). But you can't compare ati/amd with Fermi or Tesla. No way!

Bdot 2012-07-27 15:08

[QUOTE=Dubslow;306116]The reason an LL program hasn't been implemented on OpenCL/AMD cards is because there is no OpenCL FFT library, whereas nVidia provides a cufftlib for CUDA cards.[/QUOTE]

AMD ships an example FFT implementation along with the APP SDK. As OpenCL is a little hesitant to enforce double type processing (be it emulated on SP-only HW), this example is only SP so far.
I've ordered an HD7850, and when that is up and running I may spend a little time to see if that FFT example can easily be changed to DP. In case that succeeds, I may be able to provide a few performance figures.
In any case, an FFT "library" at an "example" stage is something different than cufft.
[QUOTE=LaurV;306130]
Talking amd against nvidia is only a gamer subject. For games, amd cards may be "better" or "faster" and they are "cheaper". But think about that "cheaper" part: generally, you get what you pay for. For "real" [strike]things[/strike]race they don't even see the tail of the nvidias...

And see [URL="http://mersenne-aries.sili.net/mfaktc.php?sort=ghdpd"]James' page[/URL] for "real" speed comparison. You can - at most - compare ati/amd cards with the last nvidia fiasco: 6xx series, as all of such cards are "video cards", fast for games, lousy for general computing (GPGPU). But you can't compare ati/amd with Fermi or Tesla. No way![/QUOTE]

Quite energetic!
Certainly, the mentioned milkyway, or bit coining etc. are not gaming (in a sense of 3D-video games).

You're quite right that the AMD VLIW5/4 architecture had big difficulties with computing. But when NV made a step backwards with their 6xx series, AMD made a leap forward with it's GCN. With mfakto, an HD7970 may still not reach a 580, but it is on par with a 570. And I finally ordered a GCN-card and expect to find one or another trick to speed up GCN even more.

Your statements are on weak ground (at least until "big kepler" blasts AMD to pieces again :smile:).

kracker 2012-07-28 01:53

[QUOTE=Bdot;306187]And I finally ordered a GCN-card and expect to find one or another trick to speed up GCN even more.[/QUOTE]

I would like to know how well the 7850 preforms, since I have a 7770, (one step down), just curious how much of a performance increase going up to it (not that I'll get it, just curious):smile:

LaurV 2012-08-04 10:46

ok, now the million dollar question: how do I...:whistle:(how to put this?)... "convert" a save file from 1440k fft into 1568k fft? (1600k would work too :D)

the point is that I am testing two 27M exponents and after 18M, respective 24M iterations, I get error >0.35, which is reproducible when I restart from 1, 2, 3, anterior checkpoints. Theoretically, the checkpoint (discrete transform) is converted by the program into residue (integer), from which the last figures are displayed. The program is doing this anyhow, as he needs first to keep the errors under control, and second to substract that "2" on each iteration. Next step should be to convert the binary form of the residue in a "different size" transform. Can I use CL (some switch) or othr 3rd party tool, do this faster than re-run all the ~20M iterations with a bigger FFT?

Edit: P95 is "trying with a larger FFT" or "trying with a slow method" in these cases. What CL does to avoid it? One patch for the future should be to select a larger FFT for 27M range. 1568 is the fastest, after 1440.

Edit2: 27000929 is still running, but the errors are at the limit, I got 0.2578, 0.2582, etc. Here should be the last frontier where CL automatically selects 1440k as FFT size, keeping in mind that not all people do "tuning" for each exponent size :D. You can end up like me, with half of the test done in vain. A method of "conversion" should be available...

Meantime I restarted the other exponents with 1568k, as for my gtx580's all values in between or after, result in longer times. Both runs reached first 1M iter, and up to now all residues (every 100k) match with the first run.

Edit3: 8M iterations, everything matching.

Dubslow 2012-08-04 19:01

Are you using 2.03 or 2.04? The latter should test the average roundoff at the beginning of the test and select a higher length if the roundoff is too high.

Keep in mind that the errors shown on screen is only the maximum error since the last checkpoint (2.04) or the maximum error since the last (re)start (2.03). That means the average should be lower than 0.25. Maximum errors of 0.25-0.30, maybe even a bit higher, should still be okay.

As for the error > 0.35, did it print what the actual error was? And yes, what with all the reports of Prime95 v27 issues, it did occur to me that CUDALucas doesn't handle too-large errors very nicely...

As for the "FFT conversion", it should be possible with some slight variant of the teeny-weeny thingies I posted here a few pages back. Note, however, that I make ABSOLUTELY NO GUARANTEE THAT IT WILL WORK IN ANY FASHION. It would be cool if it does though, the idea has occurred to me before :D

[code]#include <stdlib.h>
#include <stdio.h>
#include <string.h>
void print_time_from_seconds (int sec) // copied almost verbatim from CuLu source
{
if (sec > 3600)
{
printf ("%d", sec / 3600);
sec %= 3600;
printf (":%02d", sec / 60);
}
else
printf ("%d", sec / 60);
sec %= 60;
printf (":%02d\n", sec);
}
int main(int argc, char** argv) {
char* name;
int q, n, j, old, new;
long t;
double* x;
FILE* f;

if( argc < 4 || !argv[1] || !argv[2] || !argv[3] ) {
printf("First argument should be name of checkpoint file, second should be old FFT (full form, not K form), and third should be new FFT\n");
return -1;
}
name = argv[1]; old = atoi(argv[2]); new = atoi(argv[3]);
f = fopen(name, "rb"); // Ignore compiler warnings about "secure functions"
fread(&q, sizeof(int), 1, f);
fread(&n, sizeof(int), 1, f);
if( n != old) {
printf("Supplied old length doesn't match checkpoint's old length, aborting\n");
return -1;
}
fread(&j, sizeof(int), 1, f);
x = (double*) calloc(new, sizeof(double));
fread(x, sizeof(double), old, f);
fread(&t, sizeof(long), 1, f); // comment out this line for 2.03 save files
fclose(f);
printf("This is a checkpoint for exp = %d, n = %dK, iter = %d, and total time = %ld = ", q, n/1024, j, t);
print_time_from_seconds(t);
printf("Converting from FFT %d to FFT %d\n", old, new);
len = strlen(name)+1;
char* newname = calloc((len+=4), sizeof(char));
snprintf(newname, len, "%s.new", name);
f = fopen(newname, "wb");
fwrite(&q, sizeof(int), 1, f);
fwrite(&n, sizeof(int), 1, f);
fwrite(&j, sizeof(int), 1, f);
fwrite(x, sizeof(double), new, f);
fwrite(&t, sizeof(long), 1, f); // comment this out for 2.03 save files
fclose(f);
printf("Written new checkpoint.\n")
return 127;
}[/code]
[code]bill@Gravemind:~/CUDALucas∰∂ ckpconvert t27812929 1572864 1638400
This is a checkpoint for exp = 27812929, n = 1536K, iter = 140001, and total time = 869 = 14:29
Converting from FFT 1572864 to FFT 1638400
Written new checkpoint.[/code]

Dubslow 2012-08-04 20:24

Edit: Whoops, change "fwrite(&n, sizeof(int), 1, f);" to "fwrite(&new, sizeof(int), 1, f);" :razz:

LaurV 2012-08-04 22:07

[QUOTE=Dubslow;306919]Are you using 2.03 or 2.04? The latter should test the average roundoff at the beginning of the test and select a higher length if the roundoff is too high.[/QUOTE]
I use 2.04 last beta. You can try for yourself, I just found a nice exponent: 27290759. CL selects 1440k for it. Everything is ok until just before 2.5M iterations (takes about one hour and half), where it gets an error bigger then .35 and gets angry. This is the exponent with the "lowest" error-iter-count. Of course, to speed up, I can provide intermediary save files, just 2,3,5 minutes before the error. Honestly I thing doesn't make sense, it is clear that a bigger FFT should be used for this range.

My concern is that people who don't use manual tuning of the FFT (and rely on auto selection) will lose time and run millions of iterations in vain, as long as the program can't "increase" the FFT "on the fly" in a "clever" (transparent) way for the user, as P95 does, in such cases.

Related to the average errors, when I said "average" I was meaning the average value displayed on screen. I knew the values represent "max error" for a range.

I haven't put an eye into your program, it is 5:05 AM here, and I need to get some sleep for this night (I had something to work on). Anyhow, the two expos are almost done after restart with the new FFT size. Next time maybe.

Dubslow 2012-08-04 22:12

Okay, could you please post those save files? I'll put error-handling on the list for 2.05. Whenever you're awake, please also give the program a test. I'll run my own test, but the more the merrier (especially since we're on different platforms). If we can in fact change FFT lengths like that, then it'll make the error handling a lot easier.

Just to be clear, here's the todo list for 2.05:
[code]4. Separate check() from a hypothetical test(int q, char* expectres, int iters) function used for self testing and roundoff testing
4a. Refine print_bits() to print only the residue (related to 9a)
10. Extending 4, implement logging abilities.
6. Add V5UserID and ComputerID ini file options for later use
8. Figure out compiling arches/versions
9. Add option to not print residues at checkpoints? Option to skip extra initial error checking?
11. Extend the self test to include a lot more expos to test all FFT lengths, as well as near crossovers.
11a. Get a better idea of where crossovers are necessary.
12. Reinstall signal handler
13. Add an error handler for deep in the test
14. Print overall maxerr at end of test[/code]
Because of the triviality of 12 and 14, I'll do those after the filelocking gets fixed. (Still need a Windows compiler while flash is MIA...)

flashjh 2012-08-05 02:31

[QUOTE=Dubslow;306951](Still need a Windows compiler while flash is MIA...)[/QUOTE]
[COLOR=black][FONT=Verdana][COLOR=black][FONT=Verdana]I should be able to get back into things. Sorry for the disappearance; things got really crazy around here and I've barely been able to keep up with factoring work.[/FONT][/COLOR]
[/FONT][/COLOR]

Dubslow 2012-08-05 02:38

[QUOTE=flashjh;306976][COLOR=black][FONT=Verdana][COLOR=black][FONT=Verdana]I should be able to get back into things. Sorry for the disappearance; things got really crazy around here and I've barely been able to keep up with factoring work.[/FONT][/COLOR]
[/FONT][/COLOR][/QUOTE]

Just in time :smile: It seems you've survived whatever mess it is/was.

As for debugging, my only suggestion would be try the version I originally committed with Bdot's function definitions and see if that works. I think I saw someone mention it in that mess way up in the thread, but the mess was so confusing, especially since I wasn't following it too closely... take your time, I spose :smile:

kladner 2012-08-11 14:35

I just completed a sixth DC with 'CUDALucas-2.04-Beta-3.2-sm_13-x64', at least for the last 2-3. I have checked residues before reporting and all have matched.

Am I correct in thinking that the version above (3.2-sm_13) is preferred? This is running on a GTX 460 with driver 285.62.

flashjh 2012-08-11 14:38

@dubslow: what needs to be done now?

Dubslow 2012-08-11 14:43

[QUOTE=kladner;307655]
Am I correct in thinking that the version above (3.2-sm_13) is preferred? This is running on a GTX 460 with driver 285.62.[/QUOTE]
Whichever of the versions is fastest. If you'd like, try them all and tell us which is best :smile:

[QUOTE=flashjh;307656]@dubslow: what needs to be done now?[/QUOTE]
The filelocking on Windows (i.e. the lock file isn't getting deleted). AFAIK, that was never fixed (but I'd be happy to be wrong :smile:).

kladner 2012-08-11 14:52

[QUOTE=Dubslow;307657]Whichever of the versions is fastest. If you'd like, try them all and tell us which is best :smile:.............[/QUOTE]

I'll experiment with that if there's no problem with changing between "x.x-sm_x" varieties in mid run.

Dubslow 2012-08-11 15:43

[QUOTE=kladner;307658]I'll experiment with that if there's no problem with changing between "x.x-sm_x" varieties in mid run.[/QUOTE]

I can't see why it'd make a difference. :smile:

flashjh 2012-08-11 16:02

[QUOTE=Dubslow;307660]I can't see why it'd make a difference. :smile:[/QUOTE]

Yes, you can switch between them.

kladner 2012-08-11 16:26

OK. Thanks guys. I went on and started the comparisons since I realized that I've got all check files saved.

So far, 3.2-sm_13 is better than 4.0-sm_20 by maybe half a millisecond. I can't say exactly until I run the latter again. I let it get to the second report, but then absent-mindedly copied the time for the first.

kladner 2012-08-11 17:57

Here's what I got:
[CODE]GTX 460 (GF104) @ 715MHz (factory OC), Polite 15, Priority Normal, GPU usage 98-99%
Driver 285.62

M27680xxx FFT 1536K

CUDALucas-2.04-Beta-3.2-sm_13-x64 5.5720 ms/iter
CUDALucas-2.04Beta-4.0-sm_20-x64 6.0391 ms/iter
CUDALucas-2.04Beta-4.1-sm_21-x64 6.1662 ms/iter
CUDALucas-2.04-Beta-4.2-sm_30-x64 device_number >= device_count ... exiting
(This is probably a driver problem)[/CODE]I would upgrade the drivers except I just fought my way out of a can of worms with the correct installation of nVidia drivers. They still aren't quite right. From the first, the GTX 570 has the boxes in GPUZ for CUDA and DirectCompute unchecked, even though CUDA is clearly working with mfaktc on the card. When the 460 was absent the 570 reported correctly.

TObject 2012-08-21 18:25

I upgraded from CUDALucas-2.04 Beta-4.1-sm_21-x64.exe to CUDALucas-2.04 Beta-4.2-sm_30-x64.exe and now I am getting the following error:

CUDALucas.cu(163) : cufftSafeCall() CUFFT error 6: CUFFT_EXEC_FAILED

I thought I had bad CUDA DLLs, but I downloaded fresh ones from the recommended site, and I still get the error.

Please advise.

Thank you.

Dubslow 2012-08-22 01:21

What card are you using it on? If you're using sm_30 then you need at least a Kepler. Use the architecture which your card belongs to. [45][78]0s are 2.0, [45][1-6]0 are 2.1, and everything <= GTX 2** are 1.x.

TObject 2012-08-22 01:24

I use a GTX 580, but I am upgrading to 4.2 because the latest version of mfaktc uses 4.2.

I run CudaLukas and mfaktc side-by-side and I had a [url=http://www.mersenneforum.org/showthread.php?t=16993]problem[/url] with mismatched CUDA versions.

Thank you.

Dubslow 2012-08-22 01:31

[QUOTE=TObject;308850]I use a GTX 580, but I am upgrading to 4.2 because the latest version of mfaktc uses 4.2.
Thank you.[/QUOTE]

Okay, but you'll still need a CUDA_4.2-sm_20 executable. flash doesn't compile those AFAIK. You have to choose one with sm <= 20. If you want CUDA_4.2-sm_20, you'll have to compile it yourself or ask flash to do it.

Note that there probably won't be a performance increase from switching CUDA versions.

TObject 2012-08-22 01:33

I see. Thank you.

flashjh 2012-08-22 01:36

I can compile just about any combination, but the problem is what Dubslow already pointed out: you won't see much of an improvement.

The version that fastest on all of my 580s is CUDA_3.2 | sm_1.3. You should try that one and let us know how it works for you.

flashjh 2012-08-28 05:02

New versions of 2.04 beta are uploaded [URL="https://sourceforge.net/projects/cudalucas/files/2.04%20Beta/"]here[/URL]. They are based on r32 from SourceForge which is the baseline update discussed that included the filelocking. I had to make some minor adjustments to CUDALucas.cu and Parse.c to get it to compile, but I did not make any changes to the functions.

Without having to figure out what went wrong right now, as I was reviewing the changes between r32 and r37 to make it compile, I found that the modified [SIZE=2]open_s function that I added for MSVS, which used _sopen_s, was probably wrong and caused the problem... hopefully.[/SIZE]
[SIZE=2][/SIZE]
[SIZE=2]I have committed the changes to r38.[/SIZE]
[SIZE=2][/SIZE]
[SIZE=2]Everyone please test this build for the filelocking error. Thanks!
[/SIZE]

[QUOTE=Dubslow;308853]Okay, but you'll still need a CUDA_4.2-sm_20 executable. flash doesn't compile those AFAIK.[/QUOTE]

I compiled a 4.2 | sm_20 version, in case you need it.

TheJudger 2012-08-28 09:36

Hi,

[QUOTE=TObject;308850]I use a GTX 580, but I am upgrading to 4.2 because the latest version of mfaktc uses 4.2.

I run CudaLukas and mfaktc side-by-side and I had a [url=http://www.mersenneforum.org/showthread.php?t=16993]problem[/url] with mismatched CUDA versions.

Thank you.[/QUOTE]

[QUOTE=flashjh;308855]I can compile just about any combination, but the problem is what Dubslow already pointed out: you won't see much of an improvement.

The version that fastest on all of my 580s is CUDA_3.2 | sm_1.3. You should try that one and let us know how it works for you.[/QUOTE]

[B]In theory[/B] all you need is a driver which is capable of CUDA 4.2 or newer. Than you download and unpack mfaktc 0.19 in one directory, mfaktc has the correct runtime libs included in the download. Than download CUDALucas and put the right runtime libs into the CUDALucas directory. You don't need to install the CUDA toolkit.
There is only one point where I'm unsure: I don't know whether you can run both apps (with different CUDA versions) concurrently or not.

Oliver

Dubslow 2012-08-28 18:09

[QUOTE=flashjh;309485]New versions of 2.04 beta are uploaded [URL="https://sourceforge.net/projects/cudalucas/files/2.04%20Beta/"]here[/URL]. They are based on r32 from SourceForge which is the baseline update discussed that included the filelocking. I had to make some minor adjustments to CUDALucas.cu and Parse.c to get it to compile, but I did not make any changes to the functions.

Without having to figure out what went wrong right now, as I was reviewing the changes between r32 and r37 to make it compile, I found that the modified [SIZE=2]open_s function that I added for MSVS, which used _sopen_s, was probably wrong and caused the problem... hopefully.[/SIZE]
[SIZE=2][/SIZE]
[SIZE=2]I have committed the changes to r38.[/SIZE]
[SIZE=2][/SIZE]
[SIZE=2]Everyone please test this build for the filelocking error. Thanks!
[/SIZE][/QUOTE]
:tu:

Edit: r33 and r37 had some changes, including updated FFT lengths. Those need to be reincorporated. I'll try and make those into an r39.

Dubslow 2012-08-28 19:41

Okay, slight change of plans: I recall LaurV somewhere saying that a larger FFT length was faster than some smaller ones in CUDALucas' table, but I wasn't able to relocate that post. In addition, I will also add the signal-handling fix discussed before to r39.

In the meantime, all Windows users should test flash's latest compile for the filelocking bug; note, however, that compared to earlier beta releases, some FFT lengths might not appear. If the bug is confirmed killed, then the final release (non-beta) of 2.04 will reincorporate the changes from the old binary lost in the new ones (i.e., it will be r39). r39 will be committed when LaurV responds.

flashjh 2012-08-28 23:04

[QUOTE=Dubslow;309543]Edit: r33 and r37 had some changes, including updated FFT lengths. Those need to be reincorporated. I'll try and make those into an r39.[/QUOTE]

[QUOTE=Dubslow;309566]Okay, slight change of plans...[/QUOTE]
It had been so long, I couldn't remember what was done/not done. I remember making the FFT table changes now. I can help reincorporate, if you want, or just let me know when R39 is ready and I'll compile it.

LaurV 2012-08-29 10:42

[QUOTE=TheJudger;309504]
There is only one point where I'm unsure: I don't know whether you can run both apps (with different CUDA versions) concurrently or not.[/QUOTE]
You can't. I use different sm's for CL and mfaktc, they run perfect as long as I don't mix them for the same card. I can mix them in the computer in the same time if they target different cards and the cards are not SLI. Keeping many versions in the same time in the computer is easy, you only put the right dlls in each folder, as both mfaktc and CL look in the folder for the dll if it is not loaded. But you can't RUN two versions on the same card in the same time.


All times are UTC. The time now is 22:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.