![]() |
[url]http://www.cs.berkeley.edu/~volkov/volkov10-GTC.pdf[/url]
I hope someone try. |
[QUOTE=msft;256179][URL="http://www.cs.berkeley.edu/%7Evolkov/volkov10-GTC.pdf"]http://www.cs.berkeley.edu/~volkov/volkov10-GTC.pdf[/URL]
I hope someone try.[/QUOTE] I'll probably do, when I'm not busy trying to further improve the performance of tpsieve on my GTX 460 card with with the 270.26 beta drivers :smile: |
Well, it's been a while since I posted anything here...though I suppose that's a good thing, since llrCUDA had been faithfully crunching away on NPLB's megabit drive for at least a month. :smile:
A few weeks ago, I stopped the GPU's LLRnet client (that is, a standard Linux LLRnet setup with llrCUDA substituted for regular LLR) to play around with mfaktc a bit. Just today I attempted to restart LLRnet, but llrCUDA started giving me roundoff errors: [code] +----------------------------------------+ | LLRnet client v0.9b7 with LLR v3.8.0 | | M. Dettweiler and G. Barnes | | 2010-03-30, version 0.71 | +----------------------------------------+ Starting Lucas Lehmer Riesel prime test of 315*2^1327549-1 Using real irrational base DWT, FFT length = 262144 V1 = 11 ; Computing U0... V1 = 11 ; Computing U0...done. Starting Lucas-Lehmer loop... Iter: 1/1327548, ERROR: ROUND OFF (0.4999984503) > 0.4 Continuing from last save file. Starting Lucas Lehmer Riesel prime test of 315*2^1327549-1 Using real irrational base DWT, FFT length = 262144 V1 = 11 ; Computing U0... V1 = 11 ; Computing U0...done. Starting Lucas-Lehmer loop... Iter: 1/1327548, ERROR: ROUND OFF (0.4999998212) > 0.4 Continuing from last save file. Unrecoverable error, Restarting with next larger FFT length... Segmentation fault LLR exited; assuming user stopped with Ctrl-C. [/code] Quite oddly, it also has a similar problem when I attempt to test a number that llrCUDA successfully completed a few weeks ago: [code] Starting Lucas Lehmer Riesel prime test of 333*2^1302446-1 Using real irrational base DWT, FFT length = 524288 V1 = 5 ; Computing U0... V1 = 5 ; Computing U0...done. Starting Lucas-Lehmer loop... Iter: 1/1302445, ERROR: ROUND OFF (0.4999994338) > 0.4 Continuing from last save file. Starting Lucas Lehmer Riesel prime test of 333*2^1302446-1 Using real irrational base DWT, FFT length = 524288 V1 = 5 ; Computing U0... V1 = 5 ; Computing U0...done. Starting Lucas-Lehmer loop... Iter: 1/1302445, ERROR: ROUND OFF (0.4999997616) > 0.4 Continuing from last save file. Unrecoverable error, Restarting with next larger FFT length... Segmentation fault [/code] Anybody have a clue as to what's going on here? It's like the GPU just suddenly stopped working with llrCUDA. It works fine with mfaktc, but not with llrCUDA. Thanks, Max :smile: |
I just tried the same GPU with CUDALucas, and it doesn't work either:
[code] err = 2.60365e+141, increasing n from 1048576 err = 9.04168e+133, increasing n from 1048576 err = 0.499999, increasing n from 1048576 err = 3.47935e+134, increasing n from 1048576 err = 2.1343e+133, increasing n from 1048576 err = 2.24737e+133, increasing n from 1048576 err = 1.41092e+135, increasing n from 1048576 err = 4.49846e+133, increasing n from 1048576 err = 462220, increasing n from 1048576 err = 1.40386e+135, increasing n from 1048576 err = 1.11876e+133, increasing n from 2097152 err = 1.90002e+139, increasing n from 2097152 err = 2.28388e+133, increasing n from 2097152 err = 2.04579e+135, increasing n from 2097152 err = 8.90231e+133, increasing n from 2097152 err = 8.41655e+287, increasing n from 4194304 err = 0.5, increasing n from 4194304 err = 1.96641e+208, increasing n from 8388608 ^CCUDALucas.cu(526) : cufftSafeCall() CUFFT error. [/code] It would appear that this GPU is no longer "prime stable". :sick: nvidia-smi reports that the temperature is 42 C, well within a normal range, so that rules out overheating as a cause. Does anyone have an idea of what else might be causing this instability? Thanks, Max :smile: |
Max,
Was the GPU overclocked/overvoltaged ? What was the temperature when it was running llrCUDA and everything was fine n' dandy ? Any driver upgrades ? And what's the GPU again ? I can offer some stability testing utils, even for Linux, if you want. |
1 Attachment(s)
[QUOTE=Karl M Johnson;262098]Max,
Was the GPU overclocked/overvoltaged ? What was the temperature when it was running llrCUDA and everything was fine n' dandy ? Any driver upgrades ? And what's the GPU again ? I can offer some stability testing utils, even for Linux, if you want.[/QUOTE] The [URL="http://www.newegg.com/Product/Product.aspx?Item=N82E16814125334"]GPU[/URL] is a GTX 460 (768MB), factory overclocked to 715 MHz (stock 675 MHz). I don't recall the exact temperature when it was successfully running llrCUDA, but it never got above 75 if I remember correctly. Interestingly enough, when I tried it again just now, llrCUDA gave me a somewhat more informative error--it said the error was reproducible and thus not a hardware issue. :confused: Note that the GPU had been idle for a while before this, and since llrCUDA only ran for 20 seconds or so before finally crashing, the temperature never ended up going over 45, so heat is definitely not the issue. (For some reason, the GPU machine refuses to copy any text out of a console window right now, so I've attached a screenshot instead.) The confusing thing about all this is that I don't always get the exact same error. Often the differences in console output were very slight (sometimes it said "Segmentation fault", sometimes not, among other small differences), so I dismissed it as being due to the way I was redirecting the output to a file (which does sometimes lose certain types of error messages). Now that I see this "reproducible" message, though, I have absolutely no idea what's going on... Hope this helps, Max :smile: |
I'd do a memory test - I'm referring to your main memory.
-- Craig |
[QUOTE=nucleon;262113]I'd do a memory test - I'm referring to your main memory.
-- Craig[/QUOTE] I'm assuming you mean the "CPU" memory (RAM)--that I should run memtest86+ or the like on it? (I don't have physical access to the machine, so I'd have to have the owner run the test himself; he's not as familiar with how to do these things so I'd prefer something I can do remotely, if it's at all possible. Would a Prime95 blend test do, you think?) FYI, the same machine is also running four LLRnet clients on the CPU, and none of those have been crashing. That said, it could still be silently producing bad residuals. :sick: |
As an interesting aside, I just tried running the mfaktc self-test on the GPU in question, and it worked just fine. So it seems that it's at least "TF stable" if not "LL/LLR stable". (If that helps at all in figuring out what's up...)
|
1 Attachment(s)
Here are some thoughts:
1. Try downclocking the GPU to stock clocks. See if there's any change. Try downclocking it even further below stock clocks. Changes ? This might shed some light if the GPUs OC potential has degraded(unlikely) and/or it's unstable because of the GPU voltage. 2. If it's a GPU's VRAM issue, you can try stressing it's memory with [URL="https://simtk.org/project/xml/downloads.xml?group_id=385#package_id633"]MemtestCL[/URL] 3. It's very unlikely that it's only llrCUDA issue, and to prove/disprove this, you can try stress testing the GPU with this little tool I've attached. It's written in OpenCL and it's open source. It calculates some random passwords on the CPU, then does the same on the GPU, then verifies the results. Binaries for win32/win64 and linux32/linux64 are already included. It's not as stressing as FurMark/OCCT, but it may show instability if it exists. 4. I have no idea if FurMark/OCCT runs under wine, but those tools will surely put the GPU on it's knees if it's unstable. 5. Indeed, it would be wise to run [url=http://www.memtest.org/]memtest[/url] to stress the RAM of the PC, just to be sure. 6. Stress testing the CPU using Prime95 would reveal whether it's CPU/RAM issue or not. That's all I can muster for now. |
[QUOTE=Karl M Johnson;262137]Here are some thoughts:
1. Try downclocking the GPU to stock clocks. See if there's any change. Try downclocking it even further below stock clocks. Changes ? This might shed some light if the GPUs OC potential has degraded(unlikely) and/or it's unstable because of the GPU voltage. 2. If it's a GPU's VRAM issue, you can try stressing it's memory with [URL="https://simtk.org/project/xml/downloads.xml?group_id=385#package_id633"]MemtestCL[/URL] 3. It's very unlikely that it's only llrCUDA issue, and to prove/disprove this, you can try stress testing the GPU with this little tool I've attached. It's written in OpenCL and it's open source. It calculates some random passwords on the CPU, then does the same on the GPU, then verifies the results. Binaries for win32/win64 and linux32/linux64 are already included. It's not as stressing as FurMark/OCCT, but it may show instability if it exists. 4. I have no idea if FurMark/OCCT runs under wine, but those tools will surely put the GPU on it's knees if it's unstable. 5. Indeed, it would be wise to run [url=http://www.memtest.org/]memtest[/url] to stress the RAM of the PC, just to be sure. 6. Stress testing the CPU using Prime95 would reveal whether it's CPU/RAM issue or not. That's all I can muster for now.[/QUOTE] Okay, thanks--I'll give those a try and report back here with the results. (I must say, though, that I have no clue how to overclock/underclock a GPU; I'll try the memory and stress tests first to see if possibly the source of the problem can be determined without having to change the clock speed.) |
| All times are UTC. The time now is 13:00. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.