![]() |
[QUOTE=axn;337614]If you can, downclock the GPU memory and try. It is almost certainly hardware problem. You can also get GeneferCUDA and do a test. It should also produce similar issues.[/QUOTE]
OK, thanks guys. I actually did also try GeneferCUDA -- it hung during its self test. Unfortunately I only do Linux... Under-clocking is not currently supported. :bangheadonwall: |
[QUOTE=chalsall;337617]
Unfortunately I only do Linux... Under-clocking is not currently supported. :bangheadonwall:[/QUOTE] Oh poor you... Just kidding. I think TheJudger said the selftest(-st) is only for testing the program (software) itself. [URL="http://mersenneforum.org/showpost.php?p=322384&postcount=13"]Source[/URL] CuLu is known for [I]very[/I] stressing the VRAM.. as many will tell you. |
Second that. Taking the memory down 100-200 MHz will probably clear things up.
[SIZE=4][SIZE=4]BUT- [SIZE=2]I now see that that is not an option. Do I recall correctly that such control used to be possible, but that nVidia disabled "cool bits"[SIZE=2] in the Linux drivers?[SIZE=2] [/SIZE][/SIZE][/SIZE][/SIZE][/SIZE] |
[QUOTE=kladner;337626]I now see that that is not an option. Do I recall correctly that such control used to be possible, but that nVidia disabled "cool bits" [B][I][U]in the Linux drivers?[/U][/I][/B][/QUOTE]
Yes. There's a reason [URL="http://www.wired.com/wiredenterprise/2012/06/torvalds-nvidia-linux/"]Linus gave Nvidia the Finger[/URL]... Just to make sure this regression hasn't been fixed in the latest "Short-Lived" branch of the drivers, I'm in the process of upgrading. But I'm not hopeful... Nothing in the documentation suggests it's been fixed. But I'm an empirical kind of guy... What is most likely happening here is Nvidia wants those who use their GPUs for "compute" to pay the big-bucks for Teslas.... |
[QUOTE=kracker;337625]CuLu is known for [I]very[/I] stressing the VRAM.. as many will tell you.[/QUOTE]
Yes, but so is GPUBurn, memtest80, and gpuSIFT. I'm still not convinced there's not a bug in CUDALucas. One experiment at a time.... |
You'd think, but they didn't disable coolbits for windows. It's right there in the nvidia interface with a nice little slider bar.
ALSO: I may have mentioned it before, but I don't run cudalucas or any of it's derivitives because they cause my (watercooled) gpus to make high pitched screaming noises. This is said to be due to some sort of vibration caused by the way it moves data through the memory. No other program or any game causes my cards to scream like that. |
It was never re-enabled, and by the look of their MO it won't.
It would have been useful if they re-enabled the Coolbits code only for [I]down[/I]clocking. They are probably afraid that people would binary disasm and locate the place where "<=" condition would be tested and wipe it out with NOPs. Of course, they could obfuscate, but they apparently don't want to be bothered. |
To put on the table...
To put on the table why I think there [B][I][U]might[/U][/I][/B] be a bug in CUDALucas (and I hope I'm not talking out of school here owftheevil)...
owftheevil's program is a fork of CUDALucas. His program would always start reporting round-off-errors after about 20,000 iterations. This same machine is also doing mprime P-1 work, using 3GB of RAM and all cores. And while I stopped mfaktc before I ran his program, I didn't stop mprime. As an experiment yesterday, I stopped mprime, and ran the program... It worked fine for 59,000 iterations. Hmmm... I then restarted mprime, and within seconds the GPU P-1 program started reporting round-off errors. A CPU race condition in owftheevil's program could explain this behavior... And since owftheevil's program is a fork of CUDALucas.... |
[QUOTE=chalsall;337632]To put on the table why I think there [B][I][U]might[/U][/I][/B] be a bug in CUDALucas (and I hope I'm not talking out of school here owftheevil)...
owftheevil's program is a fork of CUDALucas. His program would always start reporting round-off-errors after about 20,000 iterations. This same machine is also doing mprime P-1 work, using 3GB of RAM and all cores. And while I stopped mfaktc before I ran his program, I didn't stop mprime. As an experiment yesterday, I stopped mprime, and ran the program... It worked fine for 59,000 iterations. Hmmm... I then restarted mprime, and within seconds the GPU P-1 program started reporting round-off errors. A CPU race condition in owftheevil's program could explain this behavior... And since owftheevil's program is a fork of CUDALucas....[/QUOTE] I have successfully completed many double checks using CUDALucas on my GPUs, always with something else running on the CPU (not mprime, but usually NFS sieving, msieve filtering or LA, ecm, or others) with no issues. I also use linux so it doesn't appear to be a driver issue. I'm happy to give owftheevil's P-1 a try here if it's helpful. |
[QUOTE=frmky;337635]I'm happy to give owftheevil's P-1 a try here if it's helpful.[/QUOTE]
It would be. But I don't feel it would be appropriate for me to give you owftheevil's code without his permission. Something strange appears to be going on. Bad hardware on my part is certainly one possibility -- imperfect code (all derived from each other) is another. This is why science (and engineering) is so interesting! :smile: |
The powersupply could cause issues, too. Or temperature inside the chassis. If you stress both, CPU and GPU, the system burns more electricity compared to only GPU is loaded.
Oliver P.S. Keep in mind that I'm running my third GTX 680 (knocking on some wood three times)... |
| All times are UTC. The time now is 23:14. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.