mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

LaurV 2012-08-29 11:09

[QUOTE=Dubslow;309566]Okay, slight change of plans: I recall LaurV somewhere saying that a larger FFT length was faster than some smaller ones in CUDALucas' table, but I wasn't able to relocate that post. In addition, I will also add the signal-handling fix discussed before to r39.

In the meantime, all Windows users should test flash's latest compile for the filelocking bug; note, however, that compared to earlier beta releases, some FFT lengths might not appear. If the bug is confirmed killed, then the final release (non-beta) of 2.04 will reincorporate the changes from the old binary lost in the new ones (i.e., it will be r39). r39 will be committed when LaurV responds.[/QUOTE]

Yes, I have the bad habit to tune the FFT size manually for every range and sometime this saved me hours of CL-ing. For example (IIRC, if not then I will give exact confirmation tomorrow, I have no internet on my house since Friday morning, and I still have to go home this evening, which will be in max half hour, and check if I am not mistaking the numbers. Here I have internet, but no GPU), so, for example, 2304k (smooth as 2^18*3^2) is much faster then the smaller ones (about 6% faster then the default one) and 2592 (smooth 2^15*3^4) is about 14% faster (NO JOKE!) than the default one (can't remember which, maybe 2400k, maybe 2568k, or the next higher one). I have an excel table somewhere, if the net problem can't be solved soon, then I will bring it here (the table is not complete, just for the ranges I had expos to test, i.e. 25M to 46M expos, but is very detailed).

And the cards are gtx580, gtx570, tesla c2050, no difference between them. Smaller granulation of FFT (smoother number) is always faster then smaller FFT size with bigger granulation (not so smooth), with very few exceptions. 1440k is such exception which id 5-smooth but still very fast! Higher then 1440 (default FFT) the default size can be almost always tuned to a better one. I can't say for sure if this is not card/OS/whatever dependent. Someone should try FFT 2592k against the smaller defaults on gtx580 on linux. I constantly get (beside of smaller/safer rounding errors) a speed improvement of 13-14% on win64/gtx580 (which is the main setup). This translates into 46-49 hours for a 4xM expo, instead of 52-55 hours.

edit: I am going home now, but you can search the forum for "2592k" I am 100% sure for this number (it seems to be only multiple of 2 and 3 too :D) and you should find my former posts. Trust better the numbers in those posts then the numbers in the current post.

Dubslow 2012-08-29 20:23

[QUOTE=LaurV;309608]Yes, I have the bad habit to tune the FFT size manually for every range and sometime this saved me hours of CL-ing. For example (IIRC, if not then I will give exact confirmation tomorrow, I have no internet on my house since Friday morning, and I still have to go home this evening, which will be in max half hour, and check if I am not mistaking the numbers. Here I have internet, but no GPU), so, for example, 2304k (smooth as 2^18*3^2) is much faster then the smaller ones (about 6% faster then the default one) and 2592 (smooth 2^15*3^4) is about 14% faster (NO JOKE!) than the default one (can't remember which, maybe 2400k, maybe 2568k, or the next higher one). I have an excel table somewhere, if the net problem can't be solved soon, then I will bring it here (the table is not complete, just for the ranges I had expos to test, i.e. 25M to 46M expos, but is very detailed).

And the cards are gtx580, gtx570, tesla c2050, no difference between them. Smaller granulation of FFT (smoother number) is always faster then smaller FFT size with bigger granulation (not so smooth), with very few exceptions. 1440k is such exception which id 5-smooth but still very fast! Higher then 1440 (default FFT) the default size can be almost always tuned to a better one. I can't say for sure if this is not card/OS/whatever dependent. Someone should try FFT 2592k against the smaller defaults on gtx580 on linux. I constantly get (beside of smaller/safer rounding errors) a speed improvement of 13-14% on win64/gtx580 (which is the main setup). This translates into 46-49 hours for a 4xM expo, instead of 52-55 hours.

edit: I am going home now, but you can search the forum for "2592k" I am 100% sure for this number (it seems to be only multiple of 2 and 3 too :D) and you should find my former posts. Trust better the numbers in those posts then the numbers in the current post.[/QUOTE]

I would love to see the spreadsheet. For what it's worth, here's all [STRIKE]four[/STRIKE] five lines of how CUDALucas chooses a length:

[code] #define COUNT 119
int multipliers[COUNT] = { 6, 8, 12, 16, 18, 24, 32,
40, 48, 64, 72, 80, 96, 120,
128, 144, 160, 192, 224, 240, 256,
288, 320, 336, 384, 448, 480, 512,
576, 640, 672, 768, 800, 864, 896,
960, 1024, 1120, 1152, 1200, 1280, 1344,
1440, 1536, 1600, 1680, 1728, 1792, 1920,
2048, 2240, 2304, 2400, 2560, 2688, 2880,
3072, 3200, 3360, 3456, 3584, 3840, 4000,
4096, 4480, 4608, 4800, 5120, 5376, 5600,
5760, 6144, 6400, 6720, 6912, 7168, 7680,
8000, 8192, 8960, 9216, 9600, 10240, 10752,
11200, 11520, 12288, 12800, 13440, 13824, 14366,
15360, 16000, 16128, 16384, 17920, 18432, 19200,
20480, 21504, 22400, 23040, 24576, 25600, 26880,
29672, 30720, 32000, 32768, 34992, 36864, 38400,
40960, 46080, 49152, 51200, 55296, 61440, 65536 };
// Largely copied from Prime95's jump tables, up to 32M
// Support up to 64M, the maximum length with threads == 1024
...
int len, i, estimate = q/20;
for(i = 0; i < COUNT; i++) {
len = 1024*multipliers[i];
if( len >= estimate )
{
return len;
}
}[/code]
If you say larger lengths are faster, it should just be a matter of removing the slower lengths from the table.

flashjh 2012-08-30 04:58

We need a switch like -fft that does more than just q/20 and then increase until >=. When enabled it can a test several FFT lengths, log the time and error for each and then select the best one for that particular exponent. If a worktodo file is used, then it runs the FFT test when each exponent is started. Once an FFT is selected, it will need to be able to put the FFT into the worktodo file for that exponent. The main problem is how many FFTs to test before it's a waste of time. (If LaurV's suggestion can be vetted, it may be possible to narrow down the FFTs to a small enough number to test all each time). Once enough test data is collected and reviewed, it may be possible to have the program select a particular set of FFTs to test based on the exponent number and GPU chipset.

One thing I noticed, when the .ini file contains a particular FFT length, if the program needs to change FFT sizes, it always goes up. However, I was testing smaller exponents that needed smaller FFTs (it took me a while to figure out why the program was failing; then I remembered the FFT size in the .ini file). The mentioned test above could also be used to select correct FFTs for all exponents if the default FFT is too big for the exponent (which caused serious rounding errors). (I guess if the -fft switch can be implemented, there will be no reason to specify FFTs in the .ini file. One could put an FFT that is incorrect in the worktodo though.)


Thoughts?

------
So far, testing of the new 2.04 beta is going well, for me. I was able to place many smaller exponents in the worktodo file and they all continued just fine. My DC still has a while left though...

[B]How is the testing going for everyone else?[/B]

kladner 2012-08-30 13:21

[QUOTE=flashjh;309712]
------
So far, testing of the new 2.04 beta is going well, for me. I was able to place many smaller exponents in the worktodo file and they all continued just fine. My DC still has a while left though...

[B]How is the testing going for everyone else?[/B][/QUOTE]

I have successfully completed 13 DC's and 2 LL's with 2.04-Beta-3.2-sm_13-x64. I think there were two times when I saw the Corrupt Save File cause a restart. I spotted these pretty quickly and was able to resume from a very recent good Save File with little lost work time.

flashjh 2012-08-30 15:23

[QUOTE=kladner;309735]I have successfully completed 13 DC's and 2 LL's with 2.04-Beta-3.2-sm_13-x64. I think there were two times when I saw the Corrupt Save File cause a restart. I spotted these pretty quickly and was able to resume from a very recent good Save File with little lost work time.[/QUOTE]

Have you switched to the updated 2.04 beta? Have you had any file locking problems with the new one?

kladner 2012-08-30 16:29

Would this creation date be the latest?[INDENT]Friday, ‎August ‎03, ‎2012, ‏‎9:21:17 AM
[/INDENT]I just downloaded it to be sure, but the one I was running has the same date. So....I guess I probably have been running the latest version.

I confess that I do not entirely understand the file locking issue.

I think most or all of the savefile corruption episodes were associated with unrelated (I think) BSODs. I have not seen CL restart (corrupt savefile) in the last 5-6 runs.

Please ask if there's other data you want.

Thanks to flash and dubslow (EDIT: and LaurV!) for all their work on this project. Bravo, Guys!

Dubslow 2012-08-30 18:01

[QUOTE=kladner;309748]Thanks to flash and dubslow (EDIT: and LaurV!) for all their work on this project. Bravo, Guys![/QUOTE]
Don't forget msft! He does all the mathy stuff :smile:

kladner 2012-08-30 18:23

[QUOTE=Dubslow;309762]Don't forget msft! He does all the mathy stuff :smile:[/QUOTE]

That's always the hazard of giving credit: leaving someone out.

Thanks msft! Sorry for the omission.

flashjh 2012-08-30 18:24

[QUOTE=kladner;309748]Would this creation date be the latest?[INDENT]Friday, ‎August ‎03, ‎2012, ‏‎9:21:17 AM
[/INDENT][/QUOTE][INDENT]Go [URL="http://sourceforge.net/projects/cudalucas/files/2.04%20Beta/"]here[/URL]. The lastest build is 28 Aug 2012
[/INDENT][QUOTE=Dubslow;309762]Don't forget msft! He does all the mathy stuff :smile:[/QUOTE]
Agree, and many others! I just make it compile on Windows :smile:

kladner 2012-08-30 18:29

Thanks Jerry. Done!

prime7989 2012-09-01 03:37

How to modify CUDALucas slightly
 
[QUOTE=flashjh;309768][INDENT]Go [URL="http://sourceforge.net/projects/cudalucas/files/2.04%20Beta/"]here[/URL]. The lastest build is 28 Aug 2012
[/INDENT]Agree, and many others! I just make it compile on Windows :smile:[/QUOTE]
How and where in code can one modify CUDALucas sources to accept fermat numbers
with Fn=2^2^n+1 and x[0]=5.0 with everything else being the same as the original CUDALucas?
Change input: from 2^p-1 to 2^2^n+1
Change the start of S_0 = 5.0 keeping the recursive formula the same S_(k+1)=(S_k)^2-2
And change the number of iterations from p-2 to 2^n-2
Changing the modulus from 2^p-1 to Fn=2^2^n+1
Again as before checking that S_(2^n-2)==0(mod Fn)
instead of S_(p-2)==0(mod Mp) where Mp=2^p-1
Can anyone post the please modify the CUDALucas code and post it?
I can try to check the code.

Dubslow 2012-09-01 04:15

[QUOTE=prime7989;309895]How and where in code can one modify CUDALucas sources to accept fermat numbers
with Fn=2^2^n+1 and x[0]=5.0 with everything else being the same as the original CUDALucas?
Change input: from 2^p-1 to 2^2^n+1
Change the start of S_0 = 5.0 keeping the recursive formula the same S_(k+1)=(S_k)^2-2
And change the number of iterations from p-2 to 2^n-2
Changing the modulus from 2^p-1 to Fn=2^2^n+1
Again as before checking that S_(2^n-2)==0(mod Fn)
instead of S_(p-2)==0(mod Mp) where Mp=2^p-1
Can anyone post the please modify the CUDALucas code and post it?
I can try to check the code.[/QUOTE]

I can do everything except the modulus part. I don't understand [URL="http://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen_algorithm"]SS[/URL] and its implementation enough to do that. Try sending a private message to [URL="http://www.mersenneforum.org/private.php?do=newpm&u=9446"]msft[/URL], who does all the mathy things (he [URL="http://www.mersenneforum.org/showthread.php?t=12576"]created[/URL] CUDALucas).

prime7989 2012-09-01 05:13

[QUOTE=Dubslow;309899]I can do everything except the modulus part. I don't understand [URL="http://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen_algorithm"]SS[/URL] and its implementation enough to do that. Try sending a private message to [URL="http://www.mersenneforum.org/private.php?do=newpm&u=9446"]msft[/URL], who does all the mathy things (he [URL="http://www.mersenneforum.org/showthread.php?t=12576"]created[/URL] CUDALucas).[/QUOTE]
Dear Dubslow,
SS is just the implementation of the FFT for modulus of forms 2^m+1 where m=2^n
Please do not worry about the SS and if you can , can you do implement what I suggested without SS and post it? I will check it and post rresults
Thanks!!!

LaurV 2012-09-01 07:23

That would be completely futile (sorry for playing RDS, sometimes I really miss him :razz:), from the memory limitation point of view. The Fermat numbers we don't know their primality status are already too big to be tested by this method, even with a Tesla K20 (assuming it will hit the market next year, or at the end of this year) you will need many of them...

msft 2012-09-01 08:45

Hi ,prime7989
You can use genefer.c , Fermat number is a subset of GFN.

ET_ 2012-09-01 14:22

[QUOTE=prime7989;309895]How and where in code can one modify CUDALucas sources to accept fermat numbers
with Fn=2^2^n+1 and x[0]=5.0 with everything else being the same as the original CUDALucas?
Change input: from 2^p-1 to 2^2^n+1
Change the start of S_0 = 5.0 keeping the recursive formula the same S_(k+1)=(S_k)^2-2
And change the number of iterations from p-2 to 2^n-2
Changing the modulus from 2^p-1 to Fn=2^2^n+1
Again as before checking that S_(2^n-2)==0(mod Fn)
instead of S_(p-2)==0(mod Mp) where Mp=2^p-1
Can anyone post the please modify the CUDALucas code and post it?
I can try to check the code.[/QUOTE]

I don't like to play the devil, but did you evaluate that F26 would be bigger in size than the actual exponents released by Primenet, and F30 would require more than one year of computation? Nearly all of them (those from F4 to F30) are already known as composites...

Luigi

prime7989 2012-09-01 16:39

[QUOTE=msft;309916]Hi ,prime7989
You can use genefer.c , Fermat number is a subset of GFN.[/QUOTE]
Hi, msft,
genefer.c is a probabilistic test and not deterministic.
Do you know of any Pepins of LLV type deterministic software?

LaurV 2012-09-01 16:39

@ET_: That was what I said in post #1566, but you put it into numbers :D

prime7989 2012-09-01 17:19

Faster Algorithms
 
[QUOTE=ET_;309934]I don't like to play the devil, but did you evaluate that F26 would be bigger in size than the actual exponents released by Primenet, and F30 would require more than one year of computation? Nearly all of them (those from F4 to F30) are already known as composites...

Luigi[/QUOTE]
Hi,msft,Luigi,Dubslow,
The goal is to find faster and faster mathematically correct deterministic algorithms for fermat and Mersennne numbers and other types of numbers so the existing software would be used as a benchmark for already known large fermart or mersenne numbers/primes. So if any one could point out in the cudalucas code where Mp=2^p-1 is calculated and modules by Mp is done i would be gratefull. Code snippets would be useful. Ofcourse I do not know much about the IBDWT
Thank you! Ofcourse the lower bound being Super log time and upper bound a poly based AKS algorithm.

Dubslow 2012-09-01 17:47

[QUOTE=prime7989;309949]Hi,msft,Luigi,Dubslow,
The goal is to find faster and faster mathematically correct deterministic algorithms for fermat and Mersennne numbers and other types of numbers so the existing software would be used as a benchmark for already known large fermart or mersenne numbers/primes. So if any one could point out in the cudalucas code where Mp=2^p-1 is calculated and modules by Mp is done i would be gratefull. Code snippets would be useful. Ofcourse I do not know much about the IBDWT
Thank you! Ofcourse the lower bound being Super log time and upper bound a poly based AKS algorithm.[/QUOTE]

The FFT is done by the CuFFT library, meaning none of the code is actually for the FFTs. If you go [URL="http://sourceforge.net/p/cudalucas/code/38/tree/trunk/CUDALucas.cu?force=True"]here[/URL] and Ctrl+F for "lucas_square (d" you can find the code that runs the test, and see the GPU kernel calls. rdft() does the FFT, and the nomarlize* kernels are the part of the SS algorithm that happens after the FFTs. (Yes, this does use the SS algorithm. It's the fastest practical multiplication algorithm.)

ET_ 2012-09-01 18:55

[QUOTE=prime7989;309949]Hi,msft,Luigi,Dubslow,
The goal is to find faster and faster mathematically correct deterministic algorithms for fermat and Mersennne numbers and other types of numbers so the existing software would be used as a benchmark for already known large fermart or mersenne numbers/primes. So if any one could point out in the cudalucas code where Mp=2^p-1 is calculated and modules by Mp is done i would be gratefull. Code snippets would be useful. Ofcourse I do not know much about the IBDWT
Thank you! Ofcourse the lower bound being Super log time and upper bound a poly based AKS algorithm.[/QUOTE]

To help you, we need to know how you plan to work.
Are you going to look for a better algorithm, or just trying to optimise the existent one?
In the first case, there is no known CODED algorithm that actually compares to LL.
In the second case, you have no chance (as LaurV pointed out), because the kernel of LL algorithm is already coded into the library.

Finally, if you just want to exchange the Mersenne test with a Fermat test, be prepared to fail, because as soon as the test approaches F31, the program will burst into pieces for lack of physical memory...

Luigi

prime7989 2012-09-01 22:52

[QUOTE=ET_;309960]To help you, we need to know how you plan to work.
Are you going to look for a better algorithm, or just trying to optimise the existent one?
In the first case, there is no known CODED algorithm that actually compares to LL.
In the second case, you have no chance (as LaurV pointed out), because the kernel of LL algorithm is already coded into the library.

Finally, if you just want to exchange the Mersenne test with a Fermat test, be prepared to fail, because as soon as the test approaches F31, the program will burst into pieces for lack of physical memory...

Luigi[/QUOTE]

Dear Luigi,
Perhaps optimization might be a better choice of words. However since the GTX680 has only 2GIGB or RAM and the GTX660Ti max 3GB so the exponent of 2 i.e 2^n is as you say limited to 2^31. But if the kernel of the LL is in the CUDA library then what is passed to it? I wish to pass 2^2^n+1 to it with 2<=n<=31
Thank you

kladner 2012-09-01 23:45

CUDALucas on GTX 570
 
[QUOTE=flashjh;309745]Have you switched to the updated 2.04 beta? Have you had any file locking problems with the new one?[/QUOTE]

This was the best quote I could use to link into my current situation.

I decided, foolishly or otherwise, to swap the work being done by the two GPUs. I wanted to see what the GTX 570 would do with CL. Also, giving mfaktc back to the 460 would free at least one CPU core for P-1. After some effort I got two instances of mfaktc running on the 460.

However, CL on the 570 still eludes me. I did some searching on the file lock situation, and I don't think that is what is happening. I think I have all the latest versions from Source Forge, as well as the libraries. There have been some different results. The one I've captured and is repeatable ATM goes through the FFT length tests about 20 times and ends with[CODE]CUDALucas.cu(159) : cufftSafeCall() CUFFT error 6: CUFFT_EXEC_FAILED[/CODE]This happens with CUDALucas-2.04-Beta-4.0-sm_20-x64 and with CUDALucas-2.04-Beta-3.2-sm_13-x64. I have removed check files between tests. I have never seen a lock file in the CUDALucas folder.

flashjh 2012-09-02 01:48

[QUOTE=kladner;309986]However, CL on the 570 still eludes me.[/QUOTE]
Ok, I was able to duplicate this error. I don't know if this is the only way to cause this error, but it's worth a shot:

Check your CUDALucas.ini file for a set FFT length. If it has one, set it to zero and try again. Or, if you're specifying a FFT length, make sure it's not too large. I would recommend using a clean .ini file from SourceForge.

If this doesn't help, please run the CUDALucas like this:

CUDALucas.exe > output.txt and post the .txt file so we can see the output.

Jerry

kladner 2012-09-02 02:00

1 Attachment(s)
Thanks, Jerry. I will get on the tests you suggest. I can say that FFT is set to 0 in the ini. I'll try with a clean ini (excepting Device=).

EDIT: Here are some results.[CODE]E:\CUDA>CUDALucas-2.04-Beta-3.2-sm_13-x64 > output.txt
mkdir: cannot create directory `savefiles': File exists
CUDALucas.cu(159) : cufftSafeCall() CUFFT error 6: CUFFT_EXEC_FAILED[/CODE]Output.txt is attached.

EDIT2: I did set Interactive, CheckRoundoffAllIterations, and SaveAllCheckpoints=1. The Device happened to be 0.

kladner 2012-09-02 02:52

second test, 4.0-sm20
 
Here's another run.
[CODE]E:\CUDA>CUDALucas-2.04-Beta-4.0-sm_20-x64 > output.txt
mkdir: cannot create directory `savefiles': File exists
CUDALucas.cu(159) : cufftSafeCall() CUFFT error 6: CUFFT_EXEC_FAILED

output.txt:

Starting M46069867 fft length = 2304K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 2304K
Starting M46069867 fft length = 2400K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 2400K
Starting M46069867 fft length = 2560K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 2560K
Starting M46069867 fft length = 2880K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 2880K
Starting M46069867 fft length = 3072K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 3072K
Starting M46069867 fft length = 3200K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 3200K
Starting M46069867 fft length = 3456K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 3456K
Starting M46069867 fft length = 3840K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 48 < 1000 && err = 0.50000 >= 0.35, increasing n from 3840K
Starting M46069867 fft length = 4000K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 4000K
Starting M46069867 fft length = 4096K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 4096K
Starting M46069867 fft length = 4608K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.49933 >= 0.35, increasing n from 4608K
Starting M46069867 fft length = 4800K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.49998 >= 0.35, increasing n from 4800K
Starting M46069867 fft length = 5120K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 5120K
Starting M46069867 fft length = 5760K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 5760K
Starting M46069867 fft length = 6144K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 6144K
Starting M46069867 fft length = 6400K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 6400K
Starting M46069867 fft length = 6912K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 6912K
Starting M46069867 fft length = 7680K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 7680K
Starting M46069867 fft length = 8000K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 8000K
Starting M46069867 fft length = 8192K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.

[/CODE]

Dubslow 2012-09-02 02:57

Add a ",1920K" (or ",2048K" or ",2240K") to the end of the worktodo line and try again.
[code]Test=BLAH,46069867,72,1,1920K[/code]

kladner 2012-09-02 03:07

[QUOTE=Dubslow;310002]Add a ",1920K" (or ",2048K" or ",2240K") to the end of the worktodo line and try again.
[code]Test=BLAH,46069867,72,1,1920K[/code][/QUOTE]

[,1920K]
[CODE]E:\CUDA>CUDALucas-2.04-Beta-4.0-sm_20-x64 > output.txt
mkdir: cannot create directory `savefiles': File exists
CUDALucas.cu(693) : cudaSafeCall() Runtime API error 30: unknown error.

output.txt:


Starting M46069867 fft length = 1920K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 1920K
Starting M46069867 fft length = 2048K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 2048K
Starting M46069867 fft length = 2304K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 2304K
Starting M46069867 fft length = 2400K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 2400K
Starting M46069867 fft length = 2560K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 2560K
Starting M46069867 fft length = 2880K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 2880K
Starting M46069867 fft length = 3072K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 3072K
Starting M46069867 fft length = 3200K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 3200K
Starting M46069867 fft length = 3456K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 3456K
Starting M46069867 fft length = 3840K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 3840K
Starting M46069867 fft length = 4000K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 4000K
Starting M46069867 fft length = 4096K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 4096K
Starting M46069867 fft length = 4608K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 4608K
Starting M46069867 fft length = 4800K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 4800K
Starting M46069867 fft length = 5120K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.37068 >= 0.35, increasing n from 5120K
Starting M46069867 fft length = 5760K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 5760K
Starting M46069867 fft length = 6144K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 6144K
Starting M46069867 fft length = 6400K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 6400K
Starting M46069867 fft length = 6912K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 6912K
Starting M46069867 fft length = 7680K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 7680K
Starting M46069867 fft length = 8000K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
[/CODE]

Dubslow 2012-09-02 03:14

[QUOTE=kladner;310003][,1920K]
[CODE]E:\CUDA>CUDALucas-2.04-Beta-4.0-sm_20-x64 > output.txt
mkdir: cannot create directory `savefiles': File exists
CUDALucas.cu(693) : cudaSafeCall() Runtime API error 30: unknown error.

output.txt:


Starting M46069867 fft length = 1920K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 1920K
Starting M46069867 fft length = 2048K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 2048K
Starting M46069867 fft length = 2304K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 2304K
Starting M46069867 fft length = 2400K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 2400K
Starting M46069867 fft length = 2560K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 2560K
Starting M46069867 fft length = 2880K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 2880K
Starting M46069867 fft length = 3072K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 3072K
Starting M46069867 fft length = 3200K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 3200K
Starting M46069867 fft length = 3456K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 3456K
Starting M46069867 fft length = 3840K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 3840K
Starting M46069867 fft length = 4000K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 4000K
Starting M46069867 fft length = 4096K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 4096K
Starting M46069867 fft length = 4608K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 4608K
Starting M46069867 fft length = 4800K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 4800K
Starting M46069867 fft length = 5120K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.37068 >= 0.35, increasing n from 5120K
Starting M46069867 fft length = 5760K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 5760K
Starting M46069867 fft length = 6144K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 6144K
Starting M46069867 fft length = 6400K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 6400K
Starting M46069867 fft length = 6912K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 6912K
Starting M46069867 fft length = 7680K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 7680K
Starting M46069867 fft length = 8000K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
[/CODE][/QUOTE]
:huh:

Try 1728 and 1792. Otherwise I have no idea.

flashjh 2012-09-02 03:23

Try running CUDALucas -r > output.txt

I'm curious if your card can pass the self test (-r)

Can you try the 3.2 | sm__1.3 also?

kladner 2012-09-02 03:34

[QUOTE=Dubslow;310005]:huh:

Try 1728 and 1792. Otherwise I have no idea.[/QUOTE]

1728 is still running, longer than any previous attempt. It seems to have passed the crisis point which took down the others. I'll give it a while longer and post the output, but this looks promising.

Thanks Bill.

EDIT: @flash: Thanks. I'll post the results shortly.

EDIT2: @Dubslow [CODE]
Starting M46069867 fft length = 1728K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 1728K
Starting M46069867 fft length = 1920K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 1920K
Starting M46069867 fft length = 2048K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 2048K
Starting M46069867 fft length = 2304K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 2304K
Starting M46069867 fft length = 2400K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.49219 >= 0.35, increasing n from 2400K
Starting M46069867 fft length = 2560K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration 100, average error = 0.07196, max error = 0.10547
Iteration 200, average error = 0.08213, max error = 0.10156
Iteration 300, average error = 0.08600, max error = 0.10156
Iteration 400, average error = 0.08815, max error = 0.10938
Iteration 500, average error = 0.08854, max error = 0.10156
Iteration 600, average error = 0.08882, max error = 0.10156
Iteration 700, average error = 0.08948, max error = 0.10156
Iteration 800, average error = 0.08965, max error = 0.09766
Iteration 900, average error = 0.08963, max error = 0.09766
Iteration 1000, average error = 0.08976 < 0.25 (max error = 0.10156), continuing test.
Iteration 10000 M( 46069867 )C, 0x21048490b7febb41, n = 2560K, CUDALucas v2.04 Beta err = 0.1172 (4:25 real, 26.4671 ms/iter, ETA 338:33:31)
Iteration 20000 M( 46069867 )C, 0xc9c576910a569076, n = 2560K, CUDALucas v2.04 Beta err = 0.1172 (4:24 real, 26.3610 ms/iter, ETA 337:07:41)
Iteration 30000 M( 46069867 )C, 0x1deb3580b15cb791, n = 2560K, CUDALucas v2.04 Beta err = 0.1172 (4:23 real, 26.3625 ms/iter, ETA 337:04:25)
SIGINT caught, writing checkpoint. Estimated time spent so far: 16:00
[/CODE]

I'll do the self-test next.

kladner 2012-09-02 04:44

1 Attachment(s)
Hi Jerry,

These are the results of only a partial CUDALucas-2.04-Beta-3.2-sm_13-x64 -r > output.txt. I'm letting it run again and will be more patient this time. How long should it be expected to run?

flashjh 2012-09-02 04:50

[QUOTE=kladner;310013]Hi Jerry,

These are the results of only a partial CUDALucas-2.04-Beta-3.2-sm_13-x64 -r > output.txt. I'm letting it run again and will be more patient this time. How long should it be expected to run?[/QUOTE]
From the source code for -r:
[CODE]
check (86243, "23992ccd735a03d9");
check (132049, "4c52a92b54635f9e");
check (216091, "30247786758b8792");
check (756839, "5d2cbe7cb24a109a");
check (859433, "3c4ad525c2d0aed0");
check (1257787, "3f45bf9bea7213ea");
check (1398269, "a4a6d2f0e34629db");
check (2976221, "2a7111b7f70fea2f");
check (3021377, "6387a70a85d46baf");
check (6972593, "88f1d2640adb89e1");
check (13466917, "9fdc1f4092b15d69");
check (20996011, "5fc58920a821da11");
check (24036583, "cbdef38a0bdc4f00");
check (25964951, "62eb3ff0a5f6237c");
check (30402457, "0b8600ef47e69d27");
check (32582657, "02751b7fcec76bb1");
check (37156667, "67ad7646a1fad514");
check (42643801, "8f90d78d5007bba7");
check (43112609, "e86891ebf6cd70c4");
[/CODE]

From your output.txt, you were on 24036583, so you were over half done, but they get slower as it goes on. I can't remember how long it takes on a 580? So far it is working on your 570. Back to the drawing board? Did you try 3.2 yet? Does it do the same thing? Please post the final -r results when they complete. Thanks.

flashjh 2012-09-02 04:58

[QUOTE=kladner;310003][,1920K]
[CODE]Starting M46069867 fft length = 2560K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 2560K
[/CODE][/QUOTE]


[QUOTE=kladner;310007]1728 is still running, longer than any previous attempt. It seems to have passed the crisis point which took down the others. I'll give it a while longer and post the output, but this looks promising.
[CODE]Starting M46069867 fft length = 2560K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration 100, average error = 0.07196, max error = 0.10547
Iteration 200, average error = 0.08213, max error = 0.10156
Iteration 300, average error = 0.08600, max error = 0.10156
Iteration 400, average error = 0.08815, max error = 0.10938
Iteration 500, average error = 0.08854, max error = 0.10156
Iteration 600, average error = 0.08882, max error = 0.10156
Iteration 700, average error = 0.08948, max error = 0.10156
Iteration 800, average error = 0.08965, max error = 0.09766
Iteration 900, average error = 0.08963, max error = 0.09766
Iteration 1000, average error = 0.08976 < 0.25 (max error = 0.10156), continuing test.
Iteration 10000 M( 46069867 )C, 0x21048490b7febb41, n = 2560K, CUDALucas v2.04 Beta err = 0.1172 (4:25 real, 26.4671 ms/iter, ETA 338:33:31)
Iteration 20000 M( 46069867 )C, 0xc9c576910a569076, n = 2560K, CUDALucas v2.04 Beta err = 0.1172 (4:24 real, 26.3610 ms/iter, ETA 337:07:41)
Iteration 30000 M( 46069867 )C, 0x1deb3580b15cb791, n = 2560K, CUDALucas v2.04 Beta err = 0.1172 (4:23 real, 26.3625 ms/iter, ETA 337:04:25)
SIGINT caught, writing checkpoint. Estimated time spent so far: 16:00
[/CODE][/QUOTE]

These are two of your runs and they have me concerened. The same FFT length worked one time and not the other? The only thing you changed is the starting FFT. Is this reproducable using 1920K and 1728K?

Let me know. I'd like to zip up my CuLu run directory and email it to you for a test to see if one of your files is corrupted? Can you PM me your email address?

kladner 2012-09-02 05:08

1 Attachment(s)
This run completed in a -t style termination. This, and the last were on CUDALucas-2.04-Beta-3.2-sm_13-x64. This would certainly be my preferred version if the current situation will unkink.

I'll send you an address via PM.

Thanks again.
Kieren

flashjh 2012-09-02 05:12

@Dubslow:
Here are the last few lines of the termination:
[CODE]
Iteration 900, average error = 0.03671, max error = 0.04199
Iteration 1000, average error = 0.03675 < 0.25 (max error = 0.04102), continuing test.
Iteration = 1341 >= 1000 && err = 0.5 >= 0.35, fft length = 160K, writing checkpoint file (because -t is enabled) and exiting.
[/CODE]
I remember reading somewhere(?) about Prime95 having a problem reporting incorrect rounding results. If you look at the run, it's ~.03 (max .04) and then .5; doesn't make sense. Do you remember reading the Prime95 error stuff? Seems like it was during testing 27.7. Could this be related?

@klander: PM received, I'll send you what I have for testing.

kladner 2012-09-02 13:26

1 Attachment(s)
@Jerry: PM received. A typical output.txt is attached.

EDIT: It is strange that CL ran OK with ",1728K" appended. That was a limited-time test, though. I don't know if it would have continued. My general experience has been that CL will run sometimes from the beginning (no check files,) but fail on a restart.

flashjh 2012-09-02 13:32

At this point, I am inclined to say your 570 has a memory problem. It has been a while, but somewhere back in this thread we discussed memory issues. For TF it's not that big a deal, but for CL it's critical. When I get some time later, I'll flip back and see what I can find for testing. We don't have a working copy of the GPU mem test for Windows yet.

Is the 570 overclocked?

kladner 2012-09-02 14:09

The 570 is running at nVidia (not Gigabyte) stock: 732MHz. The factory OC would be 781MHz.

I have a copy of Memtest G80. I uploaded it here:

Download URL: [URL]http://www.gigasize.com/get/lvxtp94xrkc[/URL]

I just ran it on the 570 for 50 iterations without errors. In the past I've run it for several hundred. This is the command line I use:

[CODE]memtestg80 -g 0 -b 1150 50[/CODE]
-g is GPU number
-b blocks the program from calling home to Stanford without a prompt
1150 is about the largest amount of memory in MB I can get to test
50 is the number of iterations

Of course, this does not completely prove that the VRAM is sound. This card also ran with various speeds on OCCT without errors. Unfortunately, OCCT only seems to work on the active display GPU, which is the 460, so I can't retest the 570 right now.


Here are the documented switches for the program:

[CODE]F:\Dnld\Memtest86\memtestG80-1.1-windows>memtestG80 /?
-------------------------------------------------------------
| MemtestG80 v1.00 |
| |
| Usage: memtestG80 [flags] [MB GPU RAM to test] [# iters] |
| |
| Defaults: GPU 0, 128MB RAM, 50 test iterations |
| Amount of tested RAM will be rounded up to nearest 2MB |
-------------------------------------------------------------

Available flags:
--gpu N ,-g N : run test on the Nth (from 0) CUDA GPU
--license ,-l : show license terms for this build
--forcecomm, -f : DO send test results to Stanford (don't prompt)
--bancomm, -b : DO NOT send test results to Stanford (don't prompt)
--ramclock X , -r X: Specify RAM clock speed (for returned results) as X MHz
--coreclock X , -c X: Specify core/ROP clock speed (for returned results) as X MHz[/CODE]

flashjh 2012-09-02 14:31

[QUOTE=kladner;310048]The 570 is running at nVidia (not Gigabyte) stock: 732MHz. The factory OC would be 781MHz.

I have a copy of Memtest G80. I uploaded it here:

Download URL: [URL]http://www.gigasize.com/get/lvxtp94xrkc[/URL]

I just ran it on the 570 for 50 iterations without errors. In the past I've run it for several hundred. This is the command line I use:

[CODE]memtestg80 -g 0 -b 1150 50[/CODE]
-g is GPU number
-b blocks the program from calling home to Stanford without a prompt
1150 is about the largest amount of memory in MB I can get to test
50 is the number of iterations
[/QUOTE]
I didn't know about that program. At least the initial indication is that it's ok.

Anyone else have any ideas?

Edit: So I don't know why I didn't try the actual exponent before, but here is what I get for my GT 430:

[CODE]C:\CUDA2>cudalucas
------- DEVICE 1 -------
name GeForce GT 430
totalGlobalMem 1073741824
sharedMemPerBlock 49152
regsPerBlock 32768
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 65535,65535,65535
totalConstMem 65536
Compatibility 2.1
clockRate (MHz) 1400
textureAlignment 512
deviceOverlap 1
multiProcessorCount 2
Starting M46069867 fft length = 1536K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 1536K
Starting M46069867 fft length = 1600K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 1600K
Starting M46069867 fft length = 1728K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 1728K
Starting M46069867 fft length = 1920K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 1920K
Starting M46069867 fft length = 2048K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 2048K
Starting M46069867 fft length = 2304K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 2304K
Starting M46069867 fft length = 2400K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.47656 >= 0.35, increasing n from 2400K
Starting M46069867 fft length = 2560K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration 100, average error = 0.06564, max error = 0.09375
Iteration 200, average error = 0.07594, max error = 0.09375
Iteration 300, average error = 0.08075, max error = 0.09839
Iteration 400, average error = 0.08302, max error = 0.10156
Iteration 500, average error = 0.08393, max error = 0.10156
Iteration 600, average error = 0.08452, max error = 0.10156
Iteration 700, average error = 0.08507, max error = 0.10938
Iteration 800, average error = 0.08543, max error = 0.09375
Iteration 900, average error = 0.08551, max error = 0.09375
Iteration 1000, average error = 0.08555 < 0.25 (max error = 0.09375), continuing test.
Iteration 1000 M( 46069867 )C, 0xc2e28e698d96a6a5, n = 2560K, CUDALucas v2.04 Beta err = 0.0000 (0:40 real, 398.6551 ms/iter, ETA 5101:32:40)
Iteration 1100 M( 46069867 )C, 0x26edb553f2dde8c5, n = 2560K, CUDALucas v2.04 Beta err = 0.1016 (0:04 real, 37.2975 ms/iter, ETA 477:17:25)
Iteration 1200 M( 46069867 )C, 0x447233a3146c56c2, n = 2560K, CUDALucas v2.04 Beta err = 0.0859 (0:04 real, 36.8193 ms/iter, ETA 471:10:14)
Iteration 1300 M( 46069867 )C, 0x9bb8ac1450ec2bc3, n = 2560K, CUDALucas v2.04 Beta err = 0.0938 (0:03 real, 36.7898 ms/iter, ETA 470:47:31)
Iteration 1400 M( 46069867 )C, 0xa2a39d4ce7fa4ae5, n = 2560K, CUDALucas v2.04 Beta err = 0.0898 (0:04 real, 36.7626 ms/iter, ETA 470:26:34)
Iteration 1500 M( 46069867 )C, 0x5eb5481a8ef7ce27, n = 2560K, CUDALucas v2.04 Beta err = 0.1133 (0:04 real, 37.2806 ms/iter, ETA 477:04:12)
Iteration 1600 M( 46069867 )C, 0x41ae1d631ed147d8, n = 2560K, CUDALucas v2.04 Beta err = 0.0938 (0:04 real, 37.0722 ms/iter, ETA 474:24:07)
Iteration 1700 M( 46069867 )C, 0xa9fe7fa6770b4cf9, n = 2560K, CUDALucas v2.04 Beta err = 0.0938 (0:03 real, 36.7744 ms/iter, ETA 470:35:26)
Iteration 1800 M( 46069867 )C, 0xda20917b2a910d81, n = 2560K, CUDALucas v2.04 Beta err = 0.0938 (0:04 real, 36.7276 ms/iter, ETA 469:59:27)
Iteration 1900 M( 46069867 )C, 0xeffa01cc22355f33, n = 2560K, CUDALucas v2.04 Beta err = 0.0977 (0:04 real, 37.1133 ms/iter, ETA 474:55:30)
SIGINT caught, writing checkpoint. Estimated time spent so far: 1:17[/CODE]

So it didn't fail, but it does the same thing. I'll do some more testing later.

kladner 2012-09-02 14:54

I tried another run with 2560K plugged in (since that worked for you), but the results are the same except for a different final error message.
[CODE]E:\CUDA>CUDALucas-2.04-Beta-3.2-sm_13-x64

mkdir: cannot create directory `savefiles': File exists
Starting M46069867 fft length = 2560K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.39999 >= 0.35, increasing n from 2560K
Starting M46069867 fft length = 2880K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 2880K
Starting M46069867 fft length = 3072K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 3072K
Starting M46069867 fft length = 3200K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 3200K
Starting M46069867 fft length = 3456K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 3456K
Starting M46069867 fft length = 3840K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.49997 >= 0.35, increasing n from 3840K
Starting M46069867 fft length = 4000K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 4000K
Starting M46069867 fft length = 4096K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 4096K
Starting M46069867 fft length = 4608K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 4608K
Starting M46069867 fft length = 4800K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 4800K
Starting M46069867 fft length = 5120K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 5120K
Starting M46069867 fft length = 5760K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 5760K
Starting M46069867 fft length = 6144K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 6144K
Starting M46069867 fft length = 6400K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 6400K
Starting M46069867 fft length = 6912K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
[B][COLOR=Red]CUDALucas.cu(693) : cudaSafeCall() Runtime API error 30: unknown error.
[/COLOR][/B](Emphasis added.)
E:\CUDA>[/CODE]EDIT: More head scratching here. I just started another run with ",2560K" appended to the worktodo line. Still running ATM.
[CODE]E:\CUDA>CUDALucas-2.04-Beta-3.2-sm_13-x64

mkdir: cannot create directory `savefiles': File exists
Starting M46069867 fft length = 1728K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 1728K
Starting M46069867 fft length = 1920K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 1920K
Starting M46069867 fft length = 2048K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 2048K
Starting M46069867 fft length = 2304K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 2304K
Starting M46069867 fft length = 2400K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.46875 >= 0.35, increasing n from 2400K
Starting M46069867 fft length = 2560K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration 100, average error = 0.06860, max error = 0.10938
Iteration 200, average error = 0.07823, max error = 0.10547
Iteration 300, average error = 0.08137, max error = 0.09375
Iteration 400, average error = 0.08270, max error = 0.10156
Iteration 500, average error = 0.08382, max error = 0.09375
Iteration 600, average error = 0.08443, max error = 0.09375
Iteration 700, average error = 0.08478, max error = 0.09375
Iteration 800, average error = 0.08481, max error = 0.09375
Iteration 900, average error = 0.08498, max error = 0.09375
Iteration 1000, average error = 0.08530 < 0.25 (max error = 0.09375), continuing test.[/CODE]EDIT2: After a system restart, CL on the 570 failed with the following:
[CODE]\CUDA>CUDALucas-2.04-Beta-3.2-sm_13-x64

mkdir: cannot create directory `savefiles': File exists
Continuing work from a partial result of M46069867 fft length = 2560K iteration = 11070
Iteration = 11073 >= 1000 && err = 0.5 >= 0.35, fft length = 2560K, writing checkpoint file (because -t is enabled) and exiting.[/CODE]

flashjh 2012-09-02 15:02

Can someone try this exponent on Linux and post the results? 46069867 (Not the whole run, just need to know if it works or not.)

kladner 2012-09-02 16:01

I've tried 1728K and 1920K with similar results. Here are the respective final lines.[CODE]Iteration 1000, average error = 0.08530 < 0.25 (max error = 0.09375), continuing test.
Iteration = 1962 >= 1000 && err = 0.5 >= 0.35, fft length = 2560K, writing checkpoint file (because -t is enabled) and exiting.

Iteration 1000, average error = 0.00167 < 0.25 (max error = 0.00183), continuing test.
Iteration = 2541 >= 1000 && err = 0.49985 >= 0.35, fft length = 3072K, writing checkpoint file (because -t is enabled) and exiting.

[/CODE]
A question occurs to me. I have had some trouble getting nVidia drivers installed such that mfaktc will run without crippling Interrupt overhead. I am currently running 301.42. In the past I got along best with 285.62. However, my understanding is that mfaktc 0.19 requires the later driver. Am I correct in this belief? I'd be delighted to get back to 285.62 if things would run correctly with it.

EDIT: If an FFT is specified, why does CL go through the Auto routine anyway?

Dubslow 2012-09-02 16:17

[QUOTE=kladner;310054]
EDIT: If an FFT is specified, why does CL go through the Auto routine anyway?[/QUOTE]

Just because the user said to do something doesn't mean it will work.

I'll run a Linux test later today.

kladner 2012-09-02 16:21

[QUOTE=Dubslow;310055]Just because the user said to do something doesn't mean it will work.

I'll run a Linux test later today.[/QUOTE]

OK, thanks.

One further note, I did try a DC but got the same kind of result, so it would not seem that the exponent itself is the problem.

kladner 2012-09-02 17:42

FWIW, I got my display running on the GTX 570 and ran OCCT on it for about 15 minutes. No errors reported. However, here is yet another -r run which craps out at the end.
[CODE]E:\CUDA>CUDALucas-2.04-Beta-3.2-sm_13-x64 -r

(ALL RESIDUE CORRECT RESPONSES REMOVED TO COMPLY WITH POSTING LIMITS)

Starting M2976221 fft length = 160K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 160K
Starting M2976221 fft length = 192K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 192K
Starting M2976221 fft length = 240K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.49900 >= 0.35, increasing n from 240K
Starting M2976221 fft length = 256K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration 100, average error = 0.00000, max error = 0.00000
Iteration = 144 < 1000 && err = 0.50000 >= 0.35, increasing n from 256K
Starting M2976221 fft length = 288K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 48 < 1000 && err = 0.50000 >= 0.35, increasing n from 288K
Starting M2976221 fft length = 320K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 320K
Starting M2976221 fft length = 384K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 384K
Starting M2976221 fft length = 480K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 480K
Starting M2976221 fft length = 512K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 512K
Starting M2976221 fft length = 576K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 576K
Starting M2976221 fft length = 640K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 640K
Starting M2976221 fft length = 768K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 768K
Starting M2976221 fft length = 864K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 864K
Starting M2976221 fft length = 960K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 960K
Starting M2976221 fft length = 1024K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 1024K
Starting M2976221 fft length = 1152K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 1152K
Starting M2976221 fft length = 1280K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 1280K
Starting M2976221 fft length = 1440K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 1440K
Starting M2976221 fft length = 1536K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 1536K
Starting M2976221 fft length = 1600K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 1600K
Starting M2976221 fft length = 1728K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 1728K
Starting M2976221 fft length = 1920K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 1920K
Starting M2976221 fft length = 2048K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 2048K
Starting M2976221 fft length = 2304K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 2304K
Starting M2976221 fft length = 2400K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 2400K
Starting M2976221 fft length = 2560K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 2560K
Starting M2976221 fft length = 2880K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 2880K
Starting M2976221 fft length = 3072K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 3072K
Starting M2976221 fft length = 3200K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 3200K
Starting M2976221 fft length = 3456K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 3456K
Starting M2976221 fft length = 3840K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.49709 >= 0.35, increasing n from 3840K
Starting M2976221 fft length = 4000K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration 100, average error = 0.00000, max error = 0.00000
Iteration 200, average error = 0.01625, max error = 0.25000
Iteration 300, average error = 0.01083, max error = 0.00000
Iteration 400, average error = 0.00813, max error = 0.00000
Iteration 500, average error = 0.00650, max error = 0.00000
Iteration 600, average error = 0.00542, max error = 0.00000
Iteration 700, average error = 0.00464, max error = 0.00000
Iteration 800, average error = 0.00406, max error = 0.00000
Iteration = 848 < 1000 && err = 0.50000 >= 0.35, increasing n from 4000K
Starting M2976221 fft length = 4096K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration 100, average error = 0.04760, max error = 0.31731
Iteration = 144 < 1000 && err = 0.50000 >= 0.35, increasing n from 4096K
Starting M2976221 fft length = 4608K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration 100, average error = 0.04683, max error = 0.22884
Iteration 200, average error = 0.04091, max error = 0.25000
Iteration 300, average error = 0.02728, max error = 0.00000
Iteration 400, average error = 0.02046, max error = 0.00000
Iteration 500, average error = 0.01637, max error = 0.00000
Iteration 600, average error = 0.01364, max error = 0.00000
Iteration 700, average error = 0.01169, max error = 0.00000
Iteration 800, average error = 0.01023, max error = 0.00000
Iteration 900, average error = 0.00909, max error = 0.00000
Iteration 1000, average error = 0.00819 < 0.25 (max error = 0.00000), continuing test.
Iteration = 1887 >= 1000 && err = 0.5 >= 0.35, fft length = 4608K, writing checkpoint file (because -t is enabled) and exiting.
[/CODE]

Dubslow 2012-09-02 18:12

Hmm.... does this happen for 2.03?

And this doesn't happen for the 460|2.04?

(Edit: Perchance, could you try disabling -t/CheckRoundoffAllIterations? It might not make any difference at all, but if it does...)

flashjh 2012-09-02 19:08

Ok, at this point we have (using latest 2.04 beta):

- 4.0 | 2.0 and 3.2 | 1.3 do the same thing
- Self test (-r) fails at M2976221 on both versions
- Your memtestG80 passes
- Your OCCT passes
- Sometimes your M46069867 run works sometimes not
- A DC did the same thing
- Starting with different FFT lengths doesn't really help
- Don't know yet if the 460 works or not with CuLu
- Don't know yet if 2.03 works with the 460 or 570

I'm sure I missed something, so please fill in the gaps, as necessary.

I'm suspecting drivers, maybe? Does anyone have any ideas besides drivers?

@klander: Can you give some background for the driver probelm? Are you able to completely uninstall ALL nVidia software, restart and then reinstall the newest drivers (not beta)?

kladner 2012-09-02 20:07

[QUOTE]- Don't know yet if the 460 works or not with CuLu
- Don't know yet if 2.03 works with the 460 or 570 [/QUOTE]

The 460 does fine with 2.03 and 2.04. I swapped the tasks back again and ran -r, which completed successfully. (only tested with 3.2|1.3)

The 570 fails the self-test in 2.03 and 2.04. (only tested with 3.2|1.3 in 2.03)

You are largely complete in the rest of your list.

@Dubslow: I'll try disabling -t at some point. I have tended to suspect drivers as Jerry does.

[QUOTE]@klander: Can you give some background for the driver probelm? Are you able to completely uninstall ALL nVidia software, restart and then reinstall the newest drivers (not beta)? [/QUOTE]

When I added the 570 to the existing setup I had to reinstall drivers. Initially, I put 4x mfaktc on the 570 and started running CuLu on the 460. While watching things run I saw that mfaktc was being CPU starved.

I went on at more length in an earlier post, but through searching I found a utility called Process Explorer on an MS Technet site. It showed that as I started up instances of mfaktc, the CPU was being loaded by "Interrupts" from ~6-12%. This was also impacting Prime95 running P-1. After many attempts I found that really diligent uninstallation, registry cleaning, use of a program called Drive Fusion (cleans things out), working in Safe Mode, etc. etc.

In short, yes, I can wipe out drivers and get versions installed which don't have the crippling effect on mfaktc. CuLu seems to have called attention to other issues.

While blaming drivers is a more comforting thought, I have to admit that all this calls the hardware (570) into question, too.

Thanks for all your efforts, guys. I will pursue this again in a few hours. I'm committed to firing up some charcoal and doing brats and burgers.

[OT] For anyone in the Chicago area, you can't beat Paulina Meat Market. They have the finest fresh pork brats I've ever eaten, and their ground chuck beats anything you can get at a supermarket. [/OT]

flashjh 2012-09-02 21:53

[QUOTE=kladner;310067]I went on at more length in an earlier post, but through searching I found a utility called Process Explorer on an MS Technet site. It showed that as I started up instances of mfaktc, the CPU was being loaded by "Interrupts" from ~6-12%. This was also impacting Prime95 running P-1. After many attempts I found that really diligent uninstallation, registry cleaning, use of a program called Drive Fusion (cleans things out), working in Safe Mode, etc. etc.[/QUOTE]
I'm a bit confused about the "Interrupts". What exactly do you mean? mfaktc is suppopsed to use about 1 core per instance. The exact amount certainly depends on your GPU and CPU, among other things.

For example, I can run 6 instances of mfaktc (.19) on my 580, which leaves enough overhead to run a P-1 and still be able to use the computer. The P-1 slows down quite a bit when I run mfaktc, but I don't mind because TF is the main goal.

Anyway, are you trying to make mfaktc not us the CPU, or is it something else?

Edit: Can you do something for me?

- Uninstall ALL nVidia software
- Make sure mfaktc won't run upon reboot
- Download the newest (non-beta) nVidia drivers
- Shutdown and remove the 460
- Make sure the 570 is in the primary PCI-e slot
- Reboot into Windows
- Let Windows install the video card (reboot if it asks)
- Reboot if you didn't already here ^^^
- Install nVidia's drivers, leave everything default
- Reboot again
- Try a clean CUDALucas self-test run and let us know the results.

(The fact that the 460 works fine concerns me, but let's narrow down the drivers issue first)

kladner 2012-09-03 00:56

1 Attachment(s)
[QUOTE=flashjh;310077]I'm a bit confused about the "Interrupts".
...............
Anyway, are you trying to make mfaktc not us the CPU, or is it something else?

Edit: Can you do something for me?

- Uninstall ALL nVidia software
- Make sure mfaktc won't run upon reboot
- Download the newest (non-beta) nVidia drivers
- Shutdown and remove the 460
- Make sure the 570 is in the primary PCI-e slot
- Reboot into Windows
- Let Windows install the video card (reboot if it asks)
- Reboot if you didn't already here ^^^
- Install nVidia's drivers, leave everything default
- Reboot again
- Try a clean CUDALucas self-test run and let us know the results.

(The fact that the 460 works fine concerns me, but let's narrow down the drivers issue first)[/QUOTE]

To answer the second question first, you have outlined something I have been mulling over. The 570 has never seemed quite fully integrated. By itself, it shows full capability in GPUZ. In fact, now that it is driving the display it shows full capability. See the attached for both the Interrupt issue and for which card reports CUDA etc. Previously, the 570 had the boxes unchecked which the 460 now has unchecked. This is despite the fact that the 570 ran mfaktc just fine while saying no CUDA, and the 460 is running CuLu as I write.

I have Affinity set for each instance of mfaktc, one Phenom II core each. To begin to saturate the 570 takes 4 instances, and it helps to OC the CPU and throttle the GPU. However, heat issues this time of year lead to throttling everying.

One other thing I found is that the 2 3D nVidia drivers frequently glitch during install, with error messages, but not aborting. Also, after a recent install [U]with[/U] 3D drivers I had "nvstlink.exe" consuming as much CPU as a properly functioning mfaktc instance while performing no known function. Meanwhile, 2 mfaktc instances were running at about half their normal throughput. Uninstalling the 3D, and 3D Control drivers and rebooting put a stop to that. Since I don't game with 3D glasses, or much at all, I don't think I'll miss it.

I think you are correct in your prescription of getting one thing going at a time with a very clean driver install. I will use it as a reference. Thanks!

The Hardware Interrupt item shows in the Process Explorer capture, but at a fairly restrained level. On what I have taken to be "bad driver installs," that might be consuming more than 10% CPU instead of 1-2% or less with mfaktc running in 3-4 instances.

There's a lot of this I can't explain. I have observed some reproducible (on this machine) behavior. This includes things like the "Interrupt Load" going up and down as I started or stopped mfaktc instances during a "Bad Driver Episode". I am putting some things in quotes because that's just how I think of them. I cannot be sure what's really happening.

I'll post results when I have a chance to tear things down on the hardware and software fronts. I [I]have[/I] come to see that sometimes sequence, and overkill search and destroy in wiping out drivers, can make a difference. I've worked up quite a double checking routine. It's amazing how things with "nvidia" in the name can creep back in.

kladner 2012-09-03 03:41

GTX 570 first attempt
 
1 Attachment(s)
I'll have another shot at it in the morning. This one still crapped out.

kladner 2012-09-03 14:01

Latest attempt GTX 570
 
1 Attachment(s)
[QUOTE](Edit: Perchance, could you try disabling -t/CheckRoundoffAllIterations? It might not make any difference at all, but if it does...) [/QUOTE]The latest run is attached. -t is disabled in the ini, but the self-test seems to use it anyway.

This is following the cleaned out driver reinstall, with only the 570 installed. I'm going to have another go at it.

EDIT: This test run hung up on M859433 instead of the 29M which usually stops it.

kladner 2012-09-03 14:33

Continued testing
 
This is immediately after a default install of 301.42. The self-test failed again, as did an attempt to just run CuLu. This is with Check Roundoff disabled in the ini. The last two lines below are ones I've not seen before. Would it be worth a go with Threads=512? I guess I'll find out.

[CODE]Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation. All rights reserved.

E:\CUDA\2.04-BETA>CUDALucas-2.04-Beta-3.2-sm_13-x64

mkdir: cannot create directory `savefiles': File exists
Starting M46069867 fft length = 2304K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 2304K
Starting M46069867 fft length = 2400K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 2400K
Starting M46069867 fft length = 2560K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 2560K
Starting M46069867 fft length = 2880K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 2880K
Starting M46069867 fft length = 3072K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 3072K
Starting M46069867 fft length = 3200K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 3200K
Starting M46069867 fft length = 3456K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 3456K
Starting M46069867 fft length = 3840K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 3840K
Starting M46069867 fft length = 4000K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 4000K
Starting M46069867 fft length = 4096K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 4096K
Starting M46069867 fft length = 4608K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 4608K
Starting M46069867 fft length = 4800K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 4800K
Starting M46069867 fft length = 5120K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 5120K
Starting M46069867 fft length = 5760K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 5760K
Starting M46069867 fft length = 6144K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 6144K
Starting M46069867 fft length = 6400K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 6400K
Starting M46069867 fft length = 6912K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 6912K
Starting M46069867 fft length = 7680K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 7680K
Starting M46069867 fft length = 8000K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 8000K
Starting M46069867 fft length = 8192K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 8192K
Starting M46069867 fft length = 9216K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 9216K
Starting M46069867 fft length = 9600K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 9600K
Starting M46069867 fft length = 10240K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 10240K
Starting M46069867 fft length = 11520K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 11520K
Starting M46069867 fft length = 12288K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 12288K
Starting M46069867 fft length = 12800K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 12800K
Starting M46069867 fft length = 13824K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 13824K
Starting M46069867 fft length = 15360K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 15360K
Starting M46069867 fft length = 16000K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 16000K
over specifications Grid = 65536
try increasing threads (256) or decreasing FFT length (16384K)[/CODE]EDIT: No Joy.
[CODE]Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 32000K
over specifications Grid = 65536
try increasing threads (512) or decreasing FFT length (32768K)[/CODE]The tail end of another -r run, after another housecleaning and Safe mode reinstall of 301.42.[CODE]Starting M3021377 fft length = 4608K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration 100, average error = 0.00000, max error = 0.00000
Iteration 200, average error = 0.00000, max error = 0.00000
Iteration 300, average error = 0.00000, max error = 0.00000
Iteration 400, average error = 0.00000, max error = 0.00000
Iteration 500, average error = 0.00000, max error = 0.00000
Iteration 600, average error = 0.00000, max error = 0.00000
Iteration 700, average error = 0.00000, max error = 0.00000
Iteration 800, average error = 0.00000, max error = 0.00000
Iteration 900, average error = 0.00000, max error = 0.00000
Iteration 1000, average error = 0.00000 < 0.25 (max error = 0.00000), continuing test.
Iteration = 1960 >= 1000 && err = 0.5 >= 0.35, fft length = 4608K, writing checkpoint file (because -t is enabled) and exiting.[/CODE]

NOTE: This driver install qualifies as "Good" by the standard of running mfaktc without heavy CPU usage by Interrupts.

kladner 2012-09-03 15:16

Anyone running CUDALucas on a 570?
 
Well really, anyone besides me. I am curious if this malfunction is unique to my card.

EDIT: The only other thing I can think to try is to put the 570 into my partner's i7 920 and see if the same stuff happens. Unfortunately, that machine is running XP 32bit, so not all would be the same for comparison.

EDIT2: It just finished the long self-test (-st2) in mfaktc without error. Sigh.

LaurV 2012-09-03 16:04

Preliminary tests of the new build: it works fine for me, and the lock problem is gone. As I promised some time ago, I re-done the tuning for the new build and got the same scores as for the older. I will copy here the list of FFTs which are the best for GTX580, in increasing order, in k (1k=1024). All the other FFTs smaller or larger get a lower performance.

[CODE]
32
40
48
54
56
60
80
96
128
144
162
196
200
256
288
324
384
400
512
576
640
648
672
784
864
896
1024
1152
1296
1440
1568
1600
1728
2048
2160
2304
2592
2646
2800
2880
2916
3024
3136
3150
3200
3240
3456
3600
3888
4032
4096
4320
4608
5120
[/CODE]

For example, instead of using the default 160k, you get a 3% performance increase with 162k. Instead of using the default 768k, you can get about 5% faster with 784k. Using 1568k instead if 1536k, you get about 6%-8% faster. With 2592k instead of (default) 2560k there is about 14% performance increase! etc.

I don't guarantee that other cards are the same, and even for gtx580 your mileage may vary. Someone could test these values on linux.

LaurV 2012-09-03 16:07

@klander: my money are on a bad card (memory) or wrong drivers. Comparing with mfaktc is irrelevant: mfaktc does not use the card's memory at all. Didn't test new version (no gtx570 on hand just now) but the old 2.04 worked well on 570 (except the file lock problem, whatever, but it did not get the things is showing on your screen)

kladner 2012-09-03 16:19

[QUOTE=LaurV;310137]@klander: my money are on a bad card (memory) or wrong drivers. Comparing with mfaktc is irrelevant: mfaktc does not use the card's memory at all. Didn't test new version (no gtx570 on hand just now) but the old 2.04 worked well on 570 (except the file lock problem, whatever, but it did not get the things is showing on your screen)[/QUOTE]

I fear that it is boiling down to a hardware problem. I am still puzzled that memtestG80 finds no problems, but maybe it does not run enough different combinations of bit patterns.

Is the latest build you mention on the Source Forge page?

Dubslow 2012-09-03 16:26

[QUOTE=kladner;310139]Is the latest build you mention on the Source Forge page?[/QUOTE]

It should be, yes.

@LaurV: Given that cufft is written by nVidia, I'm going to go out on a limb and say that OS doesn't really matter (though what card/architecture might matter). In either case, I'm a bit too lazy to run through all the tests. Thanks for the list. (At some point in the future we'll need to extend it past 5M, but I think this covers the leading edge LLs for now.)

LaurV 2012-09-03 16:30

yes, I was meaning "the new 2.04 beta built Aug 28" which is on sourceforge, and "the old 2.04 beta" the one with the lock bug which I was using it until few hours ago (which was on sourceforge before the new build from Aug 28 be done). The names are a bit confusing.

edit @ Dubslow crosspost: I also think the os does not matter. Maybe the card does not matter too (I did not use features of gtx580 like 512/more threads, I let everything on default). The only thing which matter should be cufft library and how nvidia/msft (our msft, not the company :D) do the butterflies when squaring the numbers.

firejuggler 2012-09-03 16:30

1 Attachment(s)
CUDALucas-2.04 Beta-4.1-sm_21-x64 -r work with my 560 (see attached)

CUDALucas-2.04 Beta-4.2-sm_30-x64 -r fail (I have the right dll)
[code]
CUDALucas.cu(159) : cufftSafeCall() CUFFT error 6: CUFFT_EXEC_FAILED
[/code]

flashjh 2012-09-03 16:33

At this point I have to agree that your card is bad as all the signs point to that. Especially since your 460 works fine and the 570 consistently fails the self test.

I'll try to get the CUDA GPU memtest compiled.

It is certainly worth a shot to try it in another system [I]OR [/I]you could try a clean install of Windows.

(One note, I have not compiled a 32-bit version; if you need it for testing, let me know so I can attempt it, but no promises since it's been a while.)

Who is the manufacturer? You may be able to RMA the card.

Dubslow 2012-09-03 16:34

[QUOTE=firejuggler;310142]
CUDALucas-2.04 Beta-4.2-sm_30-x64 -r fail (I have the right dll)
[code]
CUDALucas.cu(159) : cufftSafeCall() CUFFT error 6: CUFFT_EXEC_FAILED
[/code][/QUOTE]

A 3.0 binary shouldn't and doesn't work on a 2.1 card.

firejuggler 2012-09-03 16:37

so, whatever happen, i should only get a sm_2.1 verssion?

Dubslow 2012-09-03 16:39

[QUOTE=firejuggler;310145]so, whatever happen, i should only get a sm_2.1 verssion?[/QUOTE]

Most people have found that 1.3 is fastest, even for 2.x cards. But yes, sm_21 or lower.

flashjh 2012-09-03 16:54

While working on compiling CUDA GPU memtest, I came across [URL="http://www.mersenneforum.org/showthread.php?p=233207#post233207"]this post[/URL].

I ran it on a 580, the default batch file runs 2500 iterations. Mine didn't throw an error, so I guess it worked.

Try running it and see if your 570 passes? (I uses a lot of resources and pushes the GPU to 99%)

Dubslow 2012-09-03 17:00

[QUOTE=flashjh;310148]While working on compiling CUDA GPU memtest, I came across [URL="http://www.mersenneforum.org/showthread.php?p=233207#post233207"]this post[/URL].

I ran it on a 580, the default batch file runs 2500 iterations. Mine didn't throw an error, so I guess it worked.

Try running it and see if your 570 passes? (I uses a lot of resources and pushes the GPU to 99%)[/QUOTE]

Well he does say the program he attached isn't very good, since it didn't fail even at 1.8 GHz.

flashjh 2012-09-03 17:27

[QUOTE=Dubslow;310150]Well he does say the program he attached isn't very good, since it didn't fail even at 1.8 GHz.[/QUOTE]
True, but it must have been worth something, maybe? I'm working on the CUDA memtest, but I need a guide on makefile conversion from Linux to Windows... know of anything? :unsure:

kladner 2012-09-03 17:27

[QUOTE=flashjh;310148]While working on compiling CUDA GPU memtest, I came across [URL="http://www.mersenneforum.org/showthread.php?p=233207#post233207"]this post[/URL].

I ran it on a 580, the default batch file runs 2500 iterations. Mine didn't throw an error, so I guess it worked.

Try running it and see if your 570 passes? (I uses a lot of resources and pushes the GPU to 99%)[/QUOTE]


I'll give it a try. I did notice Oliver's remarks on the mfaktc self-test, including that he does not recommend it as a hardware test.

EDIT: The card is a Gigabyte, Rev2, 3 fan version. I got it off of Ebay. I guess I can check the serial number on the Gigabyte site.

flashjh 2012-09-03 17:31

[QUOTE=kladner;310154]I'll give it a try. I did notice Oliver's remarks on the mfaktc self-test, including that he does not recommend it as a hardware test.[/QUOTE]
I was just curious if it would fail since we suspect the memory is bad. Useage on my 580 was only ~300Mb, so it probably won't fail for you anyway.

kladner 2012-09-03 18:48

[QUOTE=flashjh;310155]I was just curious if it would fail since we suspect the memory is bad. Useage on my 580 was only ~300Mb, so it probably won't fail for you anyway.[/QUOTE]

Completed 2500 successfully. I'm running 500 iterations on memtestG80, on 1274MB vRAM--the largest amount I could get the program to allocate.

I'm starting to think that I should put the 570 back on mfaktc duty and run CuLu on the 460 if I can't hammer out any other solution.

Dubslow 2012-09-03 18:51

[QUOTE=kladner;310174]
I'm starting to think that I should put the 570 back on mfaktc duty and run CuLu on the 460 if I can't hammer out any other solution.[/QUOTE]

"If it aint broke, don't fix it" -- a lazy engineer :smile:

flashjh 2012-09-03 18:52

[QUOTE=kladner;310174]Completed 2500 successfully. I'm running 500 iterations on memtestG80, on 1274MB vRAM--the largest amount I could get the program to allocate.

I'm starting to think that I should put the 570 back on mfaktc duty and run CuLu on the 460 if I can't hammer out any other solution.[/QUOTE]

I had a 580 that couldn't get good DCs, but it did just fine at mfaktc. I recommend running DCs with any GPU until you have verified its results several times.

kladner 2012-09-03 18:59

[QUOTE=flashjh;310176]I had a 580 that couldn't get good DCs, but it did just fine at mfaktc. I recommend running DCs with any GPU until you have verified its results several times.[/QUOTE]

Yeah. The 460 has turned out quite a few with no mismatches. It is pretty quick about it, too. I was just curious how fast the 570 would be at CuLu. Also, running mfaktc x3 on the 460 would let me run P-1 x3 on the CPU. The 570 doesn't saturate even with 4x mfaktc, though that was how I was running it.

chalsall 2012-09-03 19:04

[QUOTE=Dubslow;310175]"If it aint broke, don't fix it" -- a lazy engineer :smile:[/QUOTE]

"This is only temporary... Unless it works." - Red Green

[YOUTUBE]GLsyM2Crhew[/YOUTUBE]

flashjh 2012-09-03 19:04

[QUOTE=kladner;310177]Yeah. The 460 has turned out quite a few with no mismatches. It is pretty quick about it, too. I was just curious how fast the 570 would be at CuLu. Also, running mfaktc x3 on the 460 would let me run P-1 x3 on the CPU. The 570 doesn't saturate even with 4x mfaktc, though that was how I was running it.[/QUOTE]
How long does your 460 take for CL?

Dubslow 2012-09-03 19:27

[QUOTE=flashjh;310180]How long does your 460 take for CL?[/QUOTE]

Mine needs ~40 hrs for a DC.

kladner 2012-09-03 19:38

[QUOTE=flashjh;310180]How long does your 460 take for CL?[/QUOTE]

Roughly 36-48 hrs for DC, about twice that for LL. (From memory. I don't have records.)

Here is a Google search for driver removal and cleanup. (Just passing it along. I found some helpful items in this lot.)
[URL]https://www.google.com/search?q=remove+nvidia+drivers+completely&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a[/URL]

And here are test utilities for CUDA and OpenCL from the Stanford Folding@Home site.
[URL]http://folding.stanford.edu/English/DownloadUtils#ntoc2[/URL]

@chalsall: Too damned funny!

Just finished 500 iterations of memtestG80 on the GTX 570:
[CODE]Final error count after 500 iterations over 1140 MiB of GPU memory: 0 errors[/CODE]

I think I'd be happier if I could get a clear error report from some of these torture programs. I guess CuLu is just a more brutal torturer.

jrk 2012-09-03 20:08

[QUOTE=kladner;310007][CODE]Iteration 10000 M( 46069867 )C, 0x21048490b7febb41, n = 2560K, CUDALucas v2.04 Beta err = 0.1172 (4:25 real, 26.4671 ms/iter, ETA 338:33:31)
Iteration 20000 M( 46069867 )C, 0xc9c576910a569076, n = 2560K, CUDALucas v2.04 Beta err = 0.1172 (4:24 real, 26.3610 ms/iter, ETA 337:07:41)
Iteration 30000 M( 46069867 )C, 0x1deb3580b15cb791, n = 2560K, CUDALucas v2.04 Beta err = 0.1172 (4:23 real, 26.3625 ms/iter, ETA 337:04:25)
SIGINT caught, writing checkpoint. Estimated time spent so far: 16:00
[/CODE][/QUOTE]
kladner, since you posted this, it appears that sometimes you are able to run CUDALucas without problem for awhile (like here) and other times you cannot run it correctly at all.

You have two GPUs in your system, correct? Have you checked that your power supply is functioning correctly (voltages) and is rated for enough wattage to run two GPUs along with the other components? (i.e. at least 800W, depending on what else you have installed).

kladner 2012-09-03 21:14

[QUOTE=jrk;310194]kladner, since you posted this, it appears that sometimes you are able to run CUDALucas without problem for awhile (like here) and other times you cannot run it correctly at all.

You have two GPUs in your system, correct? Have you checked that your power supply is functioning correctly (voltages) and is rated for enough wattage to run two GPUs along with the other components? (i.e. at least 800W, depending on what else you have installed).[/QUOTE]

I have two in the system now. But I just put the GTX 460 back in. At Jerry's suggestion I did a lot of testing of CUDALucas with just the 570 installed. When I got the 570 I did have to get a bigger PSU, though it is only rated at 750W. ATM, with 4x mfaktc on the 570, and CuLu on the 460, and 2x P-1 on the Phenom II x6 1090t, the total draw from the line is ~675W.

Even though I could not make the 570 run CuLu, even by itself, this exercise has worked out in some senses. The 570 is now in the primary PCIe slot and it is cooling much better than it did in the secondary slot. Conversely, the 460 is running somewhat hotter, but that's OK. The 570 cranks out a lot more heat so I'm glad to see it a bit cooler.

EDIT: The fact that the 570 would occasionally run CuLu correctly (until a restart) was particularly frustrating. Still, I'm having to accept that there is something funky about that card. I wish I could pin it down.

kladner 2012-09-03 23:32

I am exploring support options with Gigabyte. The card is less than 2 years old. The warrantee is 3 years.

kladner 2012-09-05 15:25

@Flash-

Thanks for your suggestion of getting the 570 running by itself. Even though I have not been able to resolve the CuLu issue, I have gotten my display feeding off the 570. Desktop responsiveness is much better, and I can now run CuLu on the 460 with Polite=0 full time without crippling general usability.

flashjh 2012-09-05 15:53

Sure, no problem. Just wish we could get your card working. Hopefully Gigabyte will swap it for you.

kladner 2012-09-05 15:57

[QUOTE=flashjh;310405]Sure, no problem. Just wish we could get your card working. Hopefully Gigabyte will swap it for you.[/QUOTE]

Still waiting for a response to my query.

kladner 2012-09-06 21:47

I just spent another half day going through driver removal and re-installation. I have settled on the detailed instructions given here:

[url]http://www.evga.com/forums/tm.aspx?high=&m=1174372&mpage=1#1174372[/url]

Some key points the poster emphasizes:[INDENT]1) Use Windows "Programs and Features" for uninstalling. Safe Mode is mostly not recommended since PhysX installer cannot run in that state. Use of Driver Sweeper/Driver Fusion are discouraged.
2) When uninstalling, always do the display driver last.
3) Do NOT install anything but the display driver and PhysX. The 3D Vision drivers and the HD Audio driver are specifically discouraged as useless to most setups, and as being troublemakers. (He calls them, and nVidia Update bloatware.)
[/INDENT]The forum post is quite long and goes into detail as to why he recommends these procedures with drivers starting in the 260 family to the present.

I tried drivers 301.42 which is the current WHQL version, and BETAs 304.79 and 306.02. I'm current running the last.

I did get a response today from Gigabyte support. The responder did not seem to be aware of GPUs being used in GIMPs. I have answered with more details about that part of GIMPS. I also pointed out that my Gigabyte GTX 460 runs CuLu without problems, and that there are other users with 570s who don't have the same problems.

The upshot is that CuLu still fails on the 570, so I'm back to using it for mfaktc and the 460 for CuLu.

The ball's back in their court.

kladner 2012-09-25 14:33

CUDALucas errors out @ particular iteration
 
After quite a number of successful LL and DC runs on the GTX 460, this morning I discovered CuLu wasn't running. After several attempts I captured the following information.
[CODE]mkdir: cannot create directory `savefiles': File exists
Continuing work from a partial result of M27278xxx fft length = 1440K iteration = 24700001
Iteration 24800000 M( 27278xxx )C, 0xc411eff38b7892cb, n = 1440K, CUDALucas v2.04 Beta err = 0.2969 (8:42 real, 5.2142 ms/iter, ETA 3:28:34)
Iteration = 24873570 >= 1000 && err = 0.35938 >= 0.35, fft length = 1440K, writing checkpoint file (because -t is enabled) and exiting.


RESTART:
Continuing work from a partial result of M27278527 fft length = 1440K iteration = 24300001
Iteration 24400000 M( 27278xxx )C, 0x3fe0d26bf5ef3efd, n = 1440K, CUDALucas v2.04 Beta err = 0.2910 (8:44 real, 5.2432 ms/iter, ETA 4:04:40)
Iteration 24500000 M( 27278xxx )C, 0xd679e2c74c32a974, n = 1440K, CUDALucas v2.04 Beta err = 0.2969 (8:45 real, 5.2509 ms/iter, ETA 3:56:17)
Iteration 24600000 M( 27278xxx )C, 0x9b52631a0b698d53, n = 1440K, CUDALucas v2.04 Beta err = 0.2891 (8:45 real, 5.2488 ms/iter, ETA 3:47:26)
Iteration 24700000 M( 27278xxx )C, 0x46cee10be9356d7a, n = 1440K, CUDALucas v2.04 Beta err = 0.3008 (8:44 real, 5.2414 ms/iter, ETA 3:38:23)
Iteration 24800000 M( 27278xxx )C, 0xc411eff38b7892cb, n = 1440K, CUDALucas v2.04 Beta err = 0.2969 (8:44 real, 5.2433 ms/iter, ETA 3:29:44)
Iteration = 24873570 >= 1000 && err = 0.35938 >= 0.35, fft length = 1440K, writing checkpoint file (because -t is enabled) and exiting.[/CODE]I would appreciate any suggestions. As you can see, this run was within a few hours of completion. The error level barely exceeded 0.35. Do I have to restart it with a higher FFT?

Sorry if this has been addressed before. I don't remember anything quite like it.

EDIT: If restarting the exponent is the only answer, I would really appreciate a suggested FFT.

EDIT2: I restarted with -t disabled and it has gone from It. 24873502 to 24900000 with the error reported as 0.2734.
It just reached It. 25000000 err = 0.2500.

Dubslow 2012-09-25 14:51

It seems to be a reproducible error, like Prime95 sometimes shows. Unfortunately, CUDALucas doesn't have a built-in way to override the error check and keep going. (Note that the error has been north of 0.25 for the whole test, so this exponent is [i]right[/i] on the edge, and probably should have used a higher FFT from the start. Do you know what the average error from the initial roundoff test at the beginning was? If it was right on the edge, then I'll probably decrease the allowable error for the initial test.)

The workaround I can think of is to save/pause, turn off -t, and then relaunch it and hope the error doesn't get caught.

kladner 2012-09-25 14:56

I don't know the average error at the beginning, but I can restart it and see what comes out.

At the moment it is still running the continuation with -t 0 and has passed the sticking point. I'll be happy to run any tests requested.

Update: [CODE]Iteration 25100000 M( 27278xxx )C, 0x45cc61216a1a3dce, n = 1440K, CUDALucas v2.04 Beta err = 0.2656 (8:41 real, 5.2105 ms/iter, ETA 3:02:22)[/CODE]

Dubslow 2012-09-25 15:02

[QUOTE=kladner;312747]I don't know the average error at the beginning, but I can restart it and see what comes out. [/quote]Thanks.
[QUOTE=kladner;312747]
At the moment it is still running the continuation with -t 0 and has passed the sticking point. I'll be happy to run any tests requested.[/QUOTE]
Since you asked for it... this isn't the first time a too-aggressive FFT length has been picked. If you can, please run a test of the dinky little program I posed in [URL="http://www.mersenneforum.org/showthread.php?p=306898#post306898"]this first discussion[/URL] of the issue. (Unfortunately, I can't compile it, so you'll have to ask flash or start playing around with MSVS -- it's a very simple program, and so should be quite a bit easier to compile than CUDALucas.)

Edit: Use this slight revision (MSVS is ancient and uses ancient rules). The discussion linked above is still worth a read though, IMO.
[code]#include <stdlib.h>
#include <stdio.h>
#include <string.h>

void print_time_from_seconds (int sec) // copied almost verbatim from CuLu source
{
if (sec > 3600)
{
printf ("%d", sec / 3600);
sec %= 3600;
printf (":%02d", sec / 60);
}
else
printf ("%d", sec / 60);
sec %= 60;
printf (":%02d\n", sec);
}

int main(int argc, char** argv) {
char* name, * newname;
int q, n, j, old, new;
long t;
double* x;
FILE* f;

if( argc < 4 || !argv[1] || !argv[2] || !argv[3] ) {
printf("First argument should be name of checkpoint file, second should be old FFT (full form, not K form), and third should be new FFT\n");
return -1;
}
name = argv[1]; old = atoi(argv[2]); new = atoi(argv[3]);
f = fopen(name, "rb"); // Ignore compiler warnings about "secure functions"
fread(&q, sizeof(int), 1, f);
fread(&n, sizeof(int), 1, f);
if( n != old) {
printf("Supplied old length doesn't match checkpoint's old length, aborting\n");
return -1;
}
fread(&j, sizeof(int), 1, f);
x = (double*) calloc(new, sizeof(double));
fread(x, sizeof(double), old, f);
fread(&t, sizeof(long), 1, f); // comment out this line for 2.03 save files
fclose(f);
printf("This is a checkpoint for exp = %d, n = %dK, iter = %d, and total time = %ld = ", q, n/1024, j, t);
print_time_from_seconds(t);
printf("Converting from FFT %d to FFT %d\n", old, new);
len = strlen(name)+1;
newname = calloc((len+=4), sizeof(char));
snprintf(newname, len, "%s.new", name);
f = fopen(newname, "wb");
fwrite(&q, sizeof(int), 1, f);
fwrite(&n, sizeof(int), 1, f);
fwrite(&j, sizeof(int), 1, f);
fwrite(x, sizeof(double), new, f);
fwrite(&t, sizeof(long), 1, f); // comment this out for 2.03 save files
fclose(f);
printf("Written new checkpoint.\n")
return 127;
}[/code]
[code]bill@Gravemind:~/CUDALucas∰∂ ckpconvert t27812929 1572864 1638400
This is a checkpoint for exp = 27812929, n = 1536K, iter = 140001, and total time = 869 = 14:29
Converting from FFT 1572864 to FFT 1638400
Written new checkpoint.[/code]

kladner 2012-09-25 15:17

[QUOTE]If you can, please run a test of the dinky little program[/QUOTE]I'm afraid that's a bit out of my depth (compiling).

Flash, if you're watching this can you help?

Give me a few minutes and I'll recreate the beginning info for the exponent.

EDIT: [CODE]Starting M27278xxx fft length = 1440K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration 100, average error = 0.17082, max error = 0.24316
Iteration 200, average error = 0.19356, max error = 0.22656
Iteration 300, average error = 0.20363, max error = 0.25000
Iteration 400, average error = 0.20928, max error = 0.24316
Iteration 500, average error = 0.21295, max error = 0.27344
Iteration 600, average error = 0.21491, max error = 0.24609
Iteration 700, average error = 0.21601, max error = 0.24609
Iteration 800, average error = 0.21788, max error = 0.27344
Iteration 900, average error = 0.21798, max error = 0.23438
Iteration 1000, average error = 0.21805 < 0.25 (max error = 0.23438), continuing test.[/CODE]

flashjh 2012-09-25 18:34

1 Attachment(s)
I compiled the program, but it doesn't work for me. I get the right output, but the checkpoint still contains the 'old' FFT length (I tried running it in CuLu 2.04beta) and it still used the old FFT, so then I tried converting from new to old and it doesn't work, see below).

[CODE]c:\CUDA\ck>ck c27232109 1572864 1638400
This is a checkpoint for exp = 27232109, n = 1536K, iter = 2272301, and total time = 52909 = 14:41:49
Converting from FFT 1572864 to FFT 1638400
c27232109.new
Written new checkpoint.

c:\CUDA\ck>ck c27232109.new 1638400 1572864
Supplied old length doesn't match checkpoint's old length, aborting[/CODE]

Any ideas Dubslow, I would look at it more, but I have to get back to work for now?

Code I used is attached.

kladner 2012-09-25 19:02

Thanks Jerry! :smile:

Here are my latest results. M27278527 completed and matched the previous test, so I submitted it. I caught it just after the next exponent had run the 1000 iterations.
[CODE]For 27278527
Iteration 1000, average error = 0.21805 < 0.25 (max error = 0.23438), continuing test.

For 27278xxx
Iteration 1000, average error = 0.22508 < 0.25 (max error = 0.26563), continuing test.[/CODE]Since the average and max errors for the latter are higher than with 27278527, I ran [CODE]-cufftbench 32768 3276800 32768[/CODE]I looked at the results, and the next larger efficient FFT is 1536K. I put that on the worktodo line as instructed in CUDALucas.ini like this- [CODE]DoubleCheck=[KEY],27278xxx,1536K[/CODE]
This yielded-
[CODE]Iteration 1000, average error = 0.04833 < 0.25 (max error = 0.05371), continuing test.[/CODE]

It has not run long enough to determine the timing, but might be a bit slower than 1440K.

Dubslow 2012-09-25 19:23

[QUOTE=flashjh;312755]I compiled the program, but it doesn't work for me. I get the right output, but the checkpoint still contains the 'old' FFT length (I tried running it in CuLu 2.04beta) and it still used the old FFT, so then I tried converting from new to old and it doesn't work, see below).

[CODE]c:\CUDA\ck>ck c27232109 1572864 1638400
This is a checkpoint for exp = 27232109, n = 1536K, iter = 2272301, and total time = 52909 = 14:41:49
Converting from FFT 1572864 to FFT 1638400
c27232109.new
Written new checkpoint.

c:\CUDA\ck>ck c27232109.new 1638400 1572864
Supplied old length doesn't match checkpoint's old length, aborting[/CODE]

Any ideas Dubslow, I would look at it more, but I have to get back to work for now?

Code I used is attached.[/QUOTE]
:doh!:

Line 56: "fwrite(&n, sizeof(int), 1, f);" should be "fwrite(&new, sizeof(int), 1, f);".

:davieddy:

kladner 2012-09-25 20:12

For very similar exponents, 1536K is ~0.34 ms slower (94%) than 1440K on a GTX 460.

kladner 2012-09-25 22:12

[QUOTE=kladner;312759]For very similar exponents, 1536K is ~0.34 ms slower (94%) than 1440K on a GTX 460.[/QUOTE]

I have to walk this back. 1536K now seems to be about 7% faster. I'm not sure why the difference, though it is after a reboot.

Dubslow 2012-09-25 23:31

[QUOTE=kladner;312768]I have to walk this back. 1536K now seems to be about 7% faster. I'm not sure why the difference, though it is after a reboot.[/QUOTE]

:huh:

I was not expecting that.

flashjh 2012-09-25 23:48

[QUOTE=Dubslow;312775]:huh:

I was not expecting that.[/QUOTE]

Some testing still needs to be done, but LaurV put together a list of FFTs that perform better [URL="http://www.mersenneforum.org/showthread.php?p=310136#post310136"]here[/URL].

It may be worth while to do testing on your 460 and see if the results match.


All times are UTC. The time now is 22:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.