mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

flashjh 2012-09-02 04:50

[QUOTE=kladner;310013]Hi Jerry,

These are the results of only a partial CUDALucas-2.04-Beta-3.2-sm_13-x64 -r > output.txt. I'm letting it run again and will be more patient this time. How long should it be expected to run?[/QUOTE]
From the source code for -r:
[CODE]
check (86243, "23992ccd735a03d9");
check (132049, "4c52a92b54635f9e");
check (216091, "30247786758b8792");
check (756839, "5d2cbe7cb24a109a");
check (859433, "3c4ad525c2d0aed0");
check (1257787, "3f45bf9bea7213ea");
check (1398269, "a4a6d2f0e34629db");
check (2976221, "2a7111b7f70fea2f");
check (3021377, "6387a70a85d46baf");
check (6972593, "88f1d2640adb89e1");
check (13466917, "9fdc1f4092b15d69");
check (20996011, "5fc58920a821da11");
check (24036583, "cbdef38a0bdc4f00");
check (25964951, "62eb3ff0a5f6237c");
check (30402457, "0b8600ef47e69d27");
check (32582657, "02751b7fcec76bb1");
check (37156667, "67ad7646a1fad514");
check (42643801, "8f90d78d5007bba7");
check (43112609, "e86891ebf6cd70c4");
[/CODE]

From your output.txt, you were on 24036583, so you were over half done, but they get slower as it goes on. I can't remember how long it takes on a 580? So far it is working on your 570. Back to the drawing board? Did you try 3.2 yet? Does it do the same thing? Please post the final -r results when they complete. Thanks.

flashjh 2012-09-02 04:58

[QUOTE=kladner;310003][,1920K]
[CODE]Starting M46069867 fft length = 2560K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 2560K
[/CODE][/QUOTE]


[QUOTE=kladner;310007]1728 is still running, longer than any previous attempt. It seems to have passed the crisis point which took down the others. I'll give it a while longer and post the output, but this looks promising.
[CODE]Starting M46069867 fft length = 2560K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration 100, average error = 0.07196, max error = 0.10547
Iteration 200, average error = 0.08213, max error = 0.10156
Iteration 300, average error = 0.08600, max error = 0.10156
Iteration 400, average error = 0.08815, max error = 0.10938
Iteration 500, average error = 0.08854, max error = 0.10156
Iteration 600, average error = 0.08882, max error = 0.10156
Iteration 700, average error = 0.08948, max error = 0.10156
Iteration 800, average error = 0.08965, max error = 0.09766
Iteration 900, average error = 0.08963, max error = 0.09766
Iteration 1000, average error = 0.08976 < 0.25 (max error = 0.10156), continuing test.
Iteration 10000 M( 46069867 )C, 0x21048490b7febb41, n = 2560K, CUDALucas v2.04 Beta err = 0.1172 (4:25 real, 26.4671 ms/iter, ETA 338:33:31)
Iteration 20000 M( 46069867 )C, 0xc9c576910a569076, n = 2560K, CUDALucas v2.04 Beta err = 0.1172 (4:24 real, 26.3610 ms/iter, ETA 337:07:41)
Iteration 30000 M( 46069867 )C, 0x1deb3580b15cb791, n = 2560K, CUDALucas v2.04 Beta err = 0.1172 (4:23 real, 26.3625 ms/iter, ETA 337:04:25)
SIGINT caught, writing checkpoint. Estimated time spent so far: 16:00
[/CODE][/QUOTE]

These are two of your runs and they have me concerened. The same FFT length worked one time and not the other? The only thing you changed is the starting FFT. Is this reproducable using 1920K and 1728K?

Let me know. I'd like to zip up my CuLu run directory and email it to you for a test to see if one of your files is corrupted? Can you PM me your email address?

kladner 2012-09-02 05:08

1 Attachment(s)
This run completed in a -t style termination. This, and the last were on CUDALucas-2.04-Beta-3.2-sm_13-x64. This would certainly be my preferred version if the current situation will unkink.

I'll send you an address via PM.

Thanks again.
Kieren

flashjh 2012-09-02 05:12

@Dubslow:
Here are the last few lines of the termination:
[CODE]
Iteration 900, average error = 0.03671, max error = 0.04199
Iteration 1000, average error = 0.03675 < 0.25 (max error = 0.04102), continuing test.
Iteration = 1341 >= 1000 && err = 0.5 >= 0.35, fft length = 160K, writing checkpoint file (because -t is enabled) and exiting.
[/CODE]
I remember reading somewhere(?) about Prime95 having a problem reporting incorrect rounding results. If you look at the run, it's ~.03 (max .04) and then .5; doesn't make sense. Do you remember reading the Prime95 error stuff? Seems like it was during testing 27.7. Could this be related?

@klander: PM received, I'll send you what I have for testing.

kladner 2012-09-02 13:26

1 Attachment(s)
@Jerry: PM received. A typical output.txt is attached.

EDIT: It is strange that CL ran OK with ",1728K" appended. That was a limited-time test, though. I don't know if it would have continued. My general experience has been that CL will run sometimes from the beginning (no check files,) but fail on a restart.

flashjh 2012-09-02 13:32

At this point, I am inclined to say your 570 has a memory problem. It has been a while, but somewhere back in this thread we discussed memory issues. For TF it's not that big a deal, but for CL it's critical. When I get some time later, I'll flip back and see what I can find for testing. We don't have a working copy of the GPU mem test for Windows yet.

Is the 570 overclocked?

kladner 2012-09-02 14:09

The 570 is running at nVidia (not Gigabyte) stock: 732MHz. The factory OC would be 781MHz.

I have a copy of Memtest G80. I uploaded it here:

Download URL: [URL]http://www.gigasize.com/get/lvxtp94xrkc[/URL]

I just ran it on the 570 for 50 iterations without errors. In the past I've run it for several hundred. This is the command line I use:

[CODE]memtestg80 -g 0 -b 1150 50[/CODE]
-g is GPU number
-b blocks the program from calling home to Stanford without a prompt
1150 is about the largest amount of memory in MB I can get to test
50 is the number of iterations

Of course, this does not completely prove that the VRAM is sound. This card also ran with various speeds on OCCT without errors. Unfortunately, OCCT only seems to work on the active display GPU, which is the 460, so I can't retest the 570 right now.


Here are the documented switches for the program:

[CODE]F:\Dnld\Memtest86\memtestG80-1.1-windows>memtestG80 /?
-------------------------------------------------------------
| MemtestG80 v1.00 |
| |
| Usage: memtestG80 [flags] [MB GPU RAM to test] [# iters] |
| |
| Defaults: GPU 0, 128MB RAM, 50 test iterations |
| Amount of tested RAM will be rounded up to nearest 2MB |
-------------------------------------------------------------

Available flags:
--gpu N ,-g N : run test on the Nth (from 0) CUDA GPU
--license ,-l : show license terms for this build
--forcecomm, -f : DO send test results to Stanford (don't prompt)
--bancomm, -b : DO NOT send test results to Stanford (don't prompt)
--ramclock X , -r X: Specify RAM clock speed (for returned results) as X MHz
--coreclock X , -c X: Specify core/ROP clock speed (for returned results) as X MHz[/CODE]

flashjh 2012-09-02 14:31

[QUOTE=kladner;310048]The 570 is running at nVidia (not Gigabyte) stock: 732MHz. The factory OC would be 781MHz.

I have a copy of Memtest G80. I uploaded it here:

Download URL: [URL]http://www.gigasize.com/get/lvxtp94xrkc[/URL]

I just ran it on the 570 for 50 iterations without errors. In the past I've run it for several hundred. This is the command line I use:

[CODE]memtestg80 -g 0 -b 1150 50[/CODE]
-g is GPU number
-b blocks the program from calling home to Stanford without a prompt
1150 is about the largest amount of memory in MB I can get to test
50 is the number of iterations
[/QUOTE]
I didn't know about that program. At least the initial indication is that it's ok.

Anyone else have any ideas?

Edit: So I don't know why I didn't try the actual exponent before, but here is what I get for my GT 430:

[CODE]C:\CUDA2>cudalucas
------- DEVICE 1 -------
name GeForce GT 430
totalGlobalMem 1073741824
sharedMemPerBlock 49152
regsPerBlock 32768
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 65535,65535,65535
totalConstMem 65536
Compatibility 2.1
clockRate (MHz) 1400
textureAlignment 512
deviceOverlap 1
multiProcessorCount 2
Starting M46069867 fft length = 1536K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 1536K
Starting M46069867 fft length = 1600K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 1600K
Starting M46069867 fft length = 1728K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 1728K
Starting M46069867 fft length = 1920K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 1920K
Starting M46069867 fft length = 2048K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 2048K
Starting M46069867 fft length = 2304K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 2304K
Starting M46069867 fft length = 2400K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.47656 >= 0.35, increasing n from 2400K
Starting M46069867 fft length = 2560K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration 100, average error = 0.06564, max error = 0.09375
Iteration 200, average error = 0.07594, max error = 0.09375
Iteration 300, average error = 0.08075, max error = 0.09839
Iteration 400, average error = 0.08302, max error = 0.10156
Iteration 500, average error = 0.08393, max error = 0.10156
Iteration 600, average error = 0.08452, max error = 0.10156
Iteration 700, average error = 0.08507, max error = 0.10938
Iteration 800, average error = 0.08543, max error = 0.09375
Iteration 900, average error = 0.08551, max error = 0.09375
Iteration 1000, average error = 0.08555 < 0.25 (max error = 0.09375), continuing test.
Iteration 1000 M( 46069867 )C, 0xc2e28e698d96a6a5, n = 2560K, CUDALucas v2.04 Beta err = 0.0000 (0:40 real, 398.6551 ms/iter, ETA 5101:32:40)
Iteration 1100 M( 46069867 )C, 0x26edb553f2dde8c5, n = 2560K, CUDALucas v2.04 Beta err = 0.1016 (0:04 real, 37.2975 ms/iter, ETA 477:17:25)
Iteration 1200 M( 46069867 )C, 0x447233a3146c56c2, n = 2560K, CUDALucas v2.04 Beta err = 0.0859 (0:04 real, 36.8193 ms/iter, ETA 471:10:14)
Iteration 1300 M( 46069867 )C, 0x9bb8ac1450ec2bc3, n = 2560K, CUDALucas v2.04 Beta err = 0.0938 (0:03 real, 36.7898 ms/iter, ETA 470:47:31)
Iteration 1400 M( 46069867 )C, 0xa2a39d4ce7fa4ae5, n = 2560K, CUDALucas v2.04 Beta err = 0.0898 (0:04 real, 36.7626 ms/iter, ETA 470:26:34)
Iteration 1500 M( 46069867 )C, 0x5eb5481a8ef7ce27, n = 2560K, CUDALucas v2.04 Beta err = 0.1133 (0:04 real, 37.2806 ms/iter, ETA 477:04:12)
Iteration 1600 M( 46069867 )C, 0x41ae1d631ed147d8, n = 2560K, CUDALucas v2.04 Beta err = 0.0938 (0:04 real, 37.0722 ms/iter, ETA 474:24:07)
Iteration 1700 M( 46069867 )C, 0xa9fe7fa6770b4cf9, n = 2560K, CUDALucas v2.04 Beta err = 0.0938 (0:03 real, 36.7744 ms/iter, ETA 470:35:26)
Iteration 1800 M( 46069867 )C, 0xda20917b2a910d81, n = 2560K, CUDALucas v2.04 Beta err = 0.0938 (0:04 real, 36.7276 ms/iter, ETA 469:59:27)
Iteration 1900 M( 46069867 )C, 0xeffa01cc22355f33, n = 2560K, CUDALucas v2.04 Beta err = 0.0977 (0:04 real, 37.1133 ms/iter, ETA 474:55:30)
SIGINT caught, writing checkpoint. Estimated time spent so far: 1:17[/CODE]

So it didn't fail, but it does the same thing. I'll do some more testing later.

kladner 2012-09-02 14:54

I tried another run with 2560K plugged in (since that worked for you), but the results are the same except for a different final error message.
[CODE]E:\CUDA>CUDALucas-2.04-Beta-3.2-sm_13-x64

mkdir: cannot create directory `savefiles': File exists
Starting M46069867 fft length = 2560K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.39999 >= 0.35, increasing n from 2560K
Starting M46069867 fft length = 2880K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 2880K
Starting M46069867 fft length = 3072K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 3072K
Starting M46069867 fft length = 3200K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 3200K
Starting M46069867 fft length = 3456K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 3456K
Starting M46069867 fft length = 3840K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.49997 >= 0.35, increasing n from 3840K
Starting M46069867 fft length = 4000K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 4000K
Starting M46069867 fft length = 4096K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 4096K
Starting M46069867 fft length = 4608K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 4608K
Starting M46069867 fft length = 4800K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 4800K
Starting M46069867 fft length = 5120K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 5120K
Starting M46069867 fft length = 5760K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 5760K
Starting M46069867 fft length = 6144K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 6144K
Starting M46069867 fft length = 6400K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 6400K
Starting M46069867 fft length = 6912K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
[B][COLOR=Red]CUDALucas.cu(693) : cudaSafeCall() Runtime API error 30: unknown error.
[/COLOR][/B](Emphasis added.)
E:\CUDA>[/CODE]EDIT: More head scratching here. I just started another run with ",2560K" appended to the worktodo line. Still running ATM.
[CODE]E:\CUDA>CUDALucas-2.04-Beta-3.2-sm_13-x64

mkdir: cannot create directory `savefiles': File exists
Starting M46069867 fft length = 1728K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 1728K
Starting M46069867 fft length = 1920K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 1920K
Starting M46069867 fft length = 2048K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 2048K
Starting M46069867 fft length = 2304K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 2304K
Starting M46069867 fft length = 2400K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.46875 >= 0.35, increasing n from 2400K
Starting M46069867 fft length = 2560K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration 100, average error = 0.06860, max error = 0.10938
Iteration 200, average error = 0.07823, max error = 0.10547
Iteration 300, average error = 0.08137, max error = 0.09375
Iteration 400, average error = 0.08270, max error = 0.10156
Iteration 500, average error = 0.08382, max error = 0.09375
Iteration 600, average error = 0.08443, max error = 0.09375
Iteration 700, average error = 0.08478, max error = 0.09375
Iteration 800, average error = 0.08481, max error = 0.09375
Iteration 900, average error = 0.08498, max error = 0.09375
Iteration 1000, average error = 0.08530 < 0.25 (max error = 0.09375), continuing test.[/CODE]EDIT2: After a system restart, CL on the 570 failed with the following:
[CODE]\CUDA>CUDALucas-2.04-Beta-3.2-sm_13-x64

mkdir: cannot create directory `savefiles': File exists
Continuing work from a partial result of M46069867 fft length = 2560K iteration = 11070
Iteration = 11073 >= 1000 && err = 0.5 >= 0.35, fft length = 2560K, writing checkpoint file (because -t is enabled) and exiting.[/CODE]

flashjh 2012-09-02 15:02

Can someone try this exponent on Linux and post the results? 46069867 (Not the whole run, just need to know if it works or not.)

kladner 2012-09-02 16:01

I've tried 1728K and 1920K with similar results. Here are the respective final lines.[CODE]Iteration 1000, average error = 0.08530 < 0.25 (max error = 0.09375), continuing test.
Iteration = 1962 >= 1000 && err = 0.5 >= 0.35, fft length = 2560K, writing checkpoint file (because -t is enabled) and exiting.

Iteration 1000, average error = 0.00167 < 0.25 (max error = 0.00183), continuing test.
Iteration = 2541 >= 1000 && err = 0.49985 >= 0.35, fft length = 3072K, writing checkpoint file (because -t is enabled) and exiting.

[/CODE]
A question occurs to me. I have had some trouble getting nVidia drivers installed such that mfaktc will run without crippling Interrupt overhead. I am currently running 301.42. In the past I got along best with 285.62. However, my understanding is that mfaktc 0.19 requires the later driver. Am I correct in this belief? I'd be delighted to get back to 285.62 if things would run correctly with it.

EDIT: If an FFT is specified, why does CL go through the Auto routine anyway?


All times are UTC. The time now is 23:15.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.