mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

chalsall 2013-04-19 20:11

[QUOTE=TheJudger;337639]The powersupply could cause issues, too. Or temperature inside the chassis. If you stress both, CPU and GPU, the system burns more electricity compared to only GPU is loaded.[/QUOTE]

Could. Probably doesn't in this case.

This machine is very well fed with both cool air and conditioned power.

And I've gone out of my way to stress the GPU and CPUs independently.

There is a high likelihood that something else is going on here....

henryzz 2013-04-19 21:34

[QUOTE=chalsall;337640]Could. Probably doesn't in this case.

This machine is very well fed with both cool air and conditioned power.

And I've gone out of my way to stress the GPU and CPUs independently.

There is a high likelihood that something else is going on here....[/QUOTE]
Independantly is possibly the key. Have you stressed your gpu while running prime95 torture test?

chalsall 2013-04-19 21:54

[QUOTE=henryzz;337643]Independantly is possibly the key. Have you stressed your gpu while running prime95 torture test?[/QUOTE]

Have now:

[CODE][chalsall@hobbit 1]$ ./mprime -t
[Main thread Apr 19 17:40] Starting workers.
[Work thread Apr 19 17:40] Worker starting
[Work thread Apr 19 17:40] Worker starting
[Work thread Apr 19 17:40] Setting affinity to run worker on logical CPU #5
[Work thread Apr 19 17:40] Worker starting
[Work thread Apr 19 17:40] Setting affinity to run worker on logical CPU #6
[Work thread Apr 19 17:40] Beginning a continuous self-test to check your computer.
[Work thread Apr 19 17:40] Please read stress.txt. Hit ^C to end this test.
[Work thread Apr 19 17:40] Worker starting
[Work thread Apr 19 17:40] Worker starting
[Work thread Apr 19 17:40] Worker starting
[Work thread Apr 19 17:40] Setting affinity to run worker on logical CPU #4
[Work thread Apr 19 17:40] Setting affinity to run worker on logical CPU #3
[Work thread Apr 19 17:40] Worker starting
[Work thread Apr 19 17:40] Setting affinity to run worker on logical CPU #1
[Work thread Apr 19 17:40] Beginning a continuous self-test to check your computer.
[Work thread Apr 19 17:40] Please read stress.txt. Hit ^C to end this test.
[Work thread Apr 19 17:40] Worker starting
[Work thread Apr 19 17:40] Setting affinity to run worker on logical CPU #8
[Work thread Apr 19 17:40] Beginning a continuous self-test to check your computer.
[Work thread Apr 19 17:40] Please read stress.txt. Hit ^C to end this test.
[Work thread Apr 19 17:40] Setting affinity to run worker on logical CPU #2
[Work thread Apr 19 17:40] Setting affinity to run worker on logical CPU #7

...

[Work thread Apr 19 17:48] Test 4, 6500 Lucas-Lehmer iterations of M11796481 using Pentium4 FFT length 640K, Pass1=640, Pass2=1K.
[Work thread Apr 19 17:48] Test 4, 6500 Lucas-Lehmer iterations of M11796481 using Pentium4 FFT length 640K, Pass1=640, Pass2=1K.
[Work thread Apr 19 17:48] Test 4, 6500 Lucas-Lehmer iterations of M11796481 using Pentium4 FFT length 640K, Pass1=640, Pass2=1K.
[Work thread Apr 19 17:48] Test 4, 6500 Lucas-Lehmer iterations of M11796481 using Pentium4 FFT length 640K, Pass1=640, Pass2=1K.
[Work thread Apr 19 17:48] Test 4, 6500 Lucas-Lehmer iterations of M11796481 using Pentium4 FFT length 640K, Pass1=640, Pass2=1K.
[Work thread Apr 19 17:48] Test 4, 6500 Lucas-Lehmer iterations of M11796481 using Pentium4 FFT length 640K, Pass1=640, Pass2=1K.
[Work thread Apr 19 17:48] Test 4, 6500 Lucas-Lehmer iterations of M11796481 using Pentium4 FFT length 640K, Pass1=640, Pass2=1K.[/CODE]

[CODE][chalsall@hobbit cudalucas-code]$ ./CUDALucas -r

Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 2048K
Starting M13466917 fft length = 2240K
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 3584K
Starting M13466917 fft length = 3840K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 3840K
Starting M13466917 fft length = 4000K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 4000K
Starting M13466917 fft length = 4096K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 4096K
Starting M13466917 fft length = 4480K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 4480K
Starting M13466917 fft length = 4608K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 4608K
Starting M13466917 fft length = 4800K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 4800K
Starting M13466917 fft length = 5120K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 5120K
Starting M13466917 fft length = 5376K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 5376K
Starting M13466917 fft length = 5600K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 5600K
Starting M13466917 fft length = 5760K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 5760K
Starting M13466917 fft length = 6144K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 6144K
Starting M13466917 fft length = 6400K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 6400K
Starting M13466917 fft length = 6720K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 6720K
Starting M13466917 fft length = 6912K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 6912K
Starting M13466917 fft length = 7168K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 7168K
Starting M13466917 fft length = 7680K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 7680K
Starting M13466917 fft length = 8000K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 8000K
Starting M13466917 fft length = 8192K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 8192K
Starting M13466917 fft length = 8960K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 8960K
Starting M13466917 fft length = 9216K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 9216K
Starting M13466917 fft length = 9600K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 9600K
Starting M13466917 fft length = 10240K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 10240K
Starting M13466917 fft length = 10752K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 10752K
Starting M13466917 fft length = 11200K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 11200K
Starting M13466917 fft length = 11520K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 11520K
Starting M13466917 fft length = 12288K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 12288K
Starting M13466917 fft length = 12800K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration = 16 < 1000 && err = 0.50000 >= 0.35, increasing n from 12800K
The prime 13466917 is less than the fft length 13440. This will cause problems.
[/CODE]

chalsall 2013-04-19 22:09

Now let's stop the mprime test...

[CODE]Work thread Apr 19 17:57] Test 1, 800000 Lucas-Lehmer iterations of M172031 using FFT length 8K.
[Work thread Apr 19 17:59] Test 2, 800000 Lucas-Lehmer iterations of M163839 using FFT length 8K.
[Work thread Apr 19 17:59] Test 2, 800000 Lucas-Lehmer iterations of M163839 using FFT length 8K.
[Work thread Apr 19 17:59] Test 2, 800000 Lucas-Lehmer iterations of M163839 using FFT length 8K.
[Work thread Apr 19 17:59] Test 2, 800000 Lucas-Lehmer iterations of M163839 using FFT length 8K.
[Work thread Apr 19 17:59] Test 2, 800000 Lucas-Lehmer iterations of M163839 using FFT length 8K.
[Work thread Apr 19 17:59] Test 2, 800000 Lucas-Lehmer iterations of M163839 using FFT length 8K.
[Work thread Apr 19 18:00] Test 2, 800000 Lucas-Lehmer iterations of M163839 using FFT length 8K.
[Work thread Apr 19 18:00] Test 2, 800000 Lucas-Lehmer iterations of M163839 using FFT length 8K.
[Work thread Apr 19 18:01] Test 3, 800000 Lucas-Lehmer iterations of M159745 using FFT length 8K.
[Work thread Apr 19 18:01] Test 3, 800000 Lucas-Lehmer iterations of M159745 using FFT length 8K.
[Work thread Apr 19 18:01] Test 3, 800000 Lucas-Lehmer iterations of M159745 using FFT length 8K.
[Work thread Apr 19 18:02] Test 3, 800000 Lucas-Lehmer iterations of M159745 using FFT length 8K.
^C[Main thread Apr 19 18:02] Stopping all worker threads.
[Work thread Apr 19 18:02] Torture Test completed 8 tests in 21 minutes - 0 errors, 0 warnings.
[Work thread Apr 19 18:02] Worker stopped.
[Work thread Apr 19 18:02] Torture Test completed 8 tests in 21 minutes - 0 errors, 0 warnings.
[Work thread Apr 19 18:02] Worker stopped.
[Work thread Apr 19 18:02] Torture Test completed 7 tests in 21 minutes - 0 errors, 0 warnings.
[Work thread Apr 19 18:02] Worker stopped.
[Work thread Apr 19 18:02] Torture Test completed 7 tests in 21 minutes - 0 errors, 0 warnings.
[Work thread Apr 19 18:02] Worker stopped.
[Work thread Apr 19 18:02] Torture Test completed 7 tests in 21 minutes - 0 errors, 0 warnings.
[Work thread Apr 19 18:02] Worker stopped.
[Work thread Apr 19 18:02] Torture Test completed 7 tests in 21 minutes - 0 errors, 0 warnings.
[Work thread Apr 19 18:02] Worker stopped.
[Work thread Apr 19 18:02] Torture Test completed 8 tests in 21 minutes - 0 errors, 0 warnings.
[Work thread Apr 19 18:02] Worker stopped.
[Work thread Apr 19 18:02] Torture Test completed 8 tests in 21 minutes - 0 errors, 0 warnings.
[Work thread Apr 19 18:02] Worker stopped.
[Main thread Apr 19 18:02] Execution halted.
[/CODE]

Then let's immediately restart the CUDALucas test:

[CODE][chalsall@hobbit cudalucas-code]$ ./CUDALucas -r

Starting M86243 fft length = 6K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration 100, average error = 0.00001, max error = 0.00002
Iteration 200, average error = 0.00002, max error = 0.00002
Iteration 300, average error = 0.00002, max error = 0.00002
Iteration 400, average error = 0.00002, max error = 0.00002
Iteration 500, average error = 0.00002, max error = 0.00002
Iteration 600, average error = 0.00002, max error = 0.00002
Iteration 700, average error = 0.00002, max error = 0.00002
Iteration 800, average error = 0.00002, max error = 0.00002
Iteration 1000, average error = 0.00002 < 0.25 (max error = 0.00002), continuing test.
Iteration 10000 M( 86243 )C, 0x23992ccd735a03d9, n = 6K, CUDALucas v2.05 Alpha err = 0.00003 (0:01 real, 0.1093 ms/iter, ETA 0:07)
This residue is correct.

(Much snipping...)

Starting M1257787 fft length = 64K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration 100, average error = 0.07099, max error = 0.09375
Iteration 200, average error = 0.07910, max error = 0.10156
Iteration 300, average error = 0.08154, max error = 0.10156
Iteration 400, average error = 0.08322, max error = 0.10010
Iteration 500, average error = 0.08387, max error = 0.09473
Iteration 600, average error = 0.08504, max error = 0.10370
Iteration 700, average error = 0.08536, max error = 0.09766
Iteration 800, average error = 0.08528, max error = 0.09668
Iteration = 864 < 1000 && err = 0.50000 >= 0.35, increasing n from 64K
Starting M1257787 fft length = 72K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration 100, average error = 0.00401, max error = 0.00537
Iteration = 112 < 1000 && err = 0.50000 >= 0.35, increasing n from 72K
Starting M1257787 fft length = 80K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration 100, average error = 0.02841, max error = 0.20000
Iteration 200, average error = 0.01445, max error = 0.00053
Iteration 300, average error = 0.00980, max error = 0.00056
Iteration 400, average error = 0.00748, max error = 0.00055
Iteration 500, average error = 0.00608, max error = 0.00058
Iteration 600, average error = 0.00515, max error = 0.00056
Iteration 700, average error = 0.00449, max error = 0.00059
Iteration 800, average error = 0.00399, max error = 0.00059
Iteration 900, average error = 0.00360, max error = 0.00058
Iteration 1000, average error = 0.00329 < 0.25 (max error = 0.00055), continuing test.
Iteration 10000 M( 1257787 )C, 0x3f45bf9bea7213ea, n = 80K, CUDALucas v2.05 Alpha err = 0.00068 (0:04 real, 0.3963 ms/iter, ETA 8:11)
This residue is correct.

Starting M1398269 fft length = 72K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration 100, average error = 0.05693, max error = 0.07715
Iteration 200, average error = 0.06307, max error = 0.07812
Iteration 300, average error = 0.06516, max error = 0.07812
Iteration 400, average error = 0.06662, max error = 0.08594
Iteration 500, average error = 0.06691, max error = 0.07422
Iteration 600, average error = 0.06706, max error = 0.07812
Iteration 700, average error = 0.06763, max error = 0.07812
Iteration 800, average error = 0.06767, max error = 0.07812
Iteration 900, average error = 0.06794, max error = 0.07812
Iteration 1000, average error = 0.06803 < 0.25 (max error = 0.07324), continuing test.
Iteration 10000 M( 1398269 )C, 0xa4a6d2f0e34629db, n = 72K, CUDALucas v2.05 Alpha err = 0.09277 (0:03 real, 0.3746 ms/iter, ETA 8:36)
This residue is correct.

Starting M2976221 fft length = 160K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration = 80 < 1000 && err = 0.50000 >= 0.35, increasing n from 160K
Starting M2976221 fft length = 192K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 192K
Starting M2976221 fft length = 224K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration 100, average error = 0.00003, max error = 0.00004
Iteration 200, average error = 0.00003, max error = 0.00004
Iteration 300, average error = 0.00003, max error = 0.00004
Iteration 400, average error = 0.00003, max error = 0.00004
Iteration 500, average error = 0.00003, max error = 0.00004
Iteration 600, average error = 0.00003, max error = 0.00004
Iteration 700, average error = 0.00003, max error = 0.00004
Iteration 800, average error = 0.00003, max error = 0.00004
Iteration 900, average error = 0.00003, max error = 0.00004
Iteration 1000, average error = 0.00003 < 0.25 (max error = 0.00004), continuing test.
Iteration 10000 M( 2976221 )C, 0x2a7111b7f70fea2f, n = 224K, CUDALucas v2.05 Alpha err = 0.00004 (0:07 real, 0.7905 ms/iter, ETA 38:59)
This residue is correct.

Starting M3021377 fft length = 160K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration 100, average error = 0.04457, max error = 0.05859
Iteration = 112 < 1000 && err = 0.49805 >= 0.35, increasing n from 160K
Starting M3021377 fft length = 192K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration = 48 < 1000 && err = 0.49858 >= 0.35, increasing n from 192K
Starting M3021377 fft length = 224K
Running careful round off test for 1000 iterations. If average error >= 0.22, the test will restart with a larger FFT length.
Iteration 100, average error = 0.00004, max error = 0.00005
Iteration 200, average error = 0.00004, max error = 0.00005
Iteration 300, average error = 0.00004, max error = 0.00005
Iteration 400, average error = 0.00005, max error = 0.00005
Iteration 500, average error = 0.00005, max error = 0.00005
Iteration 600, average error = 0.00005, max error = 0.00005
Iteration 700, average error = 0.00005, max error = 0.00005
Iteration 800, average error = 0.00005, max error = 0.00006
Iteration 900, average error = 0.00005, max error = 0.00005
Iteration 1000, average error = 0.00005 < 0.25 (max error = 0.00006), continuing test.
Iteration = 1550 >= 1000 && err = 0.5 >= 0.35, fft length = 224K, writing checkpoint file (because -t is enabled) and exiting.

Iteration = 1550 >= 1000 && err = 0.5 >= 0.35, fft length = 224K, restarting from last checkpoint with increased fft length.

Iteration = 1550 >= 1000 && err = 0.5 >= 0.35, fft length = 240K, writing checkpoint file (because -t is enabled) and exiting.

Iteration = 1550 >= 1000 && err = 0.5 >= 0.35, fft length = 240K, restarting from last checkpoint with increased fft length.[/CODE]

We can do this all day (and night) if you want....

Aramis Wyler 2013-04-19 22:18

Probably rules out heat, but could be a power draw problem or a problem withthe mboard. You'd really need someone with a similar rig to test that, or a bunch of spare parts.

Surely though, there are other people running mprime and cudalucas.

chalsall 2013-04-19 22:32

[QUOTE=Aramis Wyler;337650]Surely though, there are other people running mprime and cudalucas.[/QUOTE]

Sure.

My argument is that perhaps we're so trusting of Prime95/mpime that we're making an assumption that CUDALucas is correct.

Perhaps this assumption is not correct. (Perhaps it is.)

Always welcome to be proven wrong.

Trying to get to the truth....

kracker 2013-04-19 23:24

[QUOTE=Aramis Wyler;337650]Probably rules out heat, but could be a power draw problem or...[/QUOTE]

I believe the PSU might be the problem as well.

chalsall 2013-04-19 23:54

[QUOTE=kracker;337656]I believe the PSU might be the problem as well.[/QUOTE]

Possible. Not likely.

The mprime -t test has now been running for hours. No errors.

During my SIFing work the card (and the machine) draws more power than during the CUDALucas self-test.

The former works just fine for days with no errors; the latter reports errors within seconds. And every other test not based on the same code works just fine on the same hardware.

Again, I'm more than happy to be proven wrong. But the evidence seems to suggest that there [I]might[/I] be a bug in CUDALucas.

To be honest, I'm finding it a little funny how quickly some are jumping to unsupported conclusions....

Aramis Wyler 2013-04-20 02:08

Just looking around. Always start at the begining. :) Is it plugged in? Etc. :)

kracker 2013-04-20 02:22

[QUOTE=Aramis Wyler;337660]... Always start at the begining. :) Is it plugged in? Etc. :)[/QUOTE]

LOL
:devil:
:missingteeth: :missingteeth:

@chalsall: Are you sure the card is in your computer, instead of something like, say your toaster?

NBtarheel_33 2013-04-20 05:45

[QUOTE=chalsall;337651]
My argument is that perhaps we're so trusting of Prime95/mpime that we're making an assumption that CUDALucas is correct.[/QUOTE]

Isn't this a fair assumption to make, though, given the number of successful CuLu doublechecks that have been completed?

I mean, if CuLu has a bug...what are the odds of so many DCs matching results obtained from Prime95/mprime?


All times are UTC. The time now is 23:14.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.