![]() |
It looks like you may be building with code for another card. You need to make sure the makefile has the proper "arch=" and "code=" lines.
[STRIKE]What video card are you using? This will determine what you should have there.[/STRIKE] Duh, it was in your post. Make sure you have --generate-code arch=compute_50,code=sm_50 You could also use compute_52,code=sm_52 as that will work with a 980. |
Hi (and sorry for late reply),
[QUOTE=preda;399753]After compiling 0.21 from source with Cuda toolkit 7.0, I consistently get this self-test failure (always the same number of tests passed/failed at the end). Any hints are appreciated, thanks. [CODE]mfaktc v0.21 (64bit built) [...] GPU Sieving enabled [...] [/CODE][/QUOTE] to debug this issue can you disable GPU sieving (mfaktc.ini) and rerun? Which CUDA version? Which driver version? Any chance to test CUDA toolkit 6.5? Oliver |
[QUOTE=TheJudger;400251]to debug this issue can you disable GPU sieving (mfaktc.ini) and rerun?
Which CUDA version? Which driver version? Any chance to test CUDA toolkit 6.5? Oliver[/QUOTE] I get the same error here (mfaktc 0.20 and 0.21/CUDA toolkit 7.0/Driver versions 346.47, 346.59 and 349.16/Debian 8 and CentOS 7) on a Maxwell card (GTX 970/compute_52/sm_52) but [B]not[/B] on a Kepler card (GTX 650/compute_30/sm_30). Disabling GPU sieving doesn't help. - A compiled binary of mfaktc 0.20 (CUDA toolkit 6.5/Debian 7) worked without problems on the GTX 970 (compute_52/sm_52). - Compiled binaries of mfaktc 0.20 and 0.21 (CUDA toolkit 6.5/CentOS 7) work without problems on the GTX 970 (compute_52/sm_52). - The binary from mersenne.ca (downloading from [URL="http://www.mersenneforum.org"]www.mersenneforum.org[/URL] is blocked) works without problems. |
Hi Ralf,
can you the problematic binary with "-st" (just a few seconds) and tell me whether it fails for specific kernels or does it fail all/"random"? Oliver |
[QUOTE=TheJudger;402368]Hi Ralf,
can you the problematic binary with "-st" (just a few seconds) and tell me whether it fails for specific kernels or does it fail all/"random"? Oliver[/QUOTE] Here is the partial output from a -st run of mfaktc 0.21 on a GTX 970 (Debian 8.0/CUDA Toolkit 7.0). [CODE] Selftest statistics number of tests 26192 successfull tests 15238 no factor found 10954 kernel | success | fail -------------------+---------+------- UNKNOWN kernel | 0 | 0 71bit_mul24 | 2586 | 0 75bit_mul32 | 1021 | 1661 95bit_mul32 | 1024 | 1843 barrett76_mul32 | 1096 | 0 barrett77_mul32 | 1114 | 0 barrett79_mul32 | 0 | 1153 barrett87_mul32 | 1066 | 0 barrett88_mul32 | 1069 | 0 barrett92_mul32 | 0 | 1084 75bit_mul32_gs | 997 | 1423 95bit_mul32_gs | 999 | 1598 barrett76_mul32_gs | 1079 | 0 barrett77_mul32_gs | 1096 | 0 barrett79_mul32_gs | 0 | 1130 barrett87_mul32_gs | 1044 | 0 barrett88_mul32_gs | 1047 | 0 barrett92_mul32_gs | 0 | 1062 selftest FAILED! random selftest offset was: 9507477 [/CODE]Additional Makefile target for compute_52/sm_52 [CODE] Selftest statistics number of tests 26192 successfull tests 15238 no factor found 10954 kernel | success | fail -------------------+---------+------- UNKNOWN kernel | 0 | 0 71bit_mul24 | 2586 | 0 75bit_mul32 | 1021 | 1661 95bit_mul32 | 1024 | 1843 barrett76_mul32 | 1096 | 0 barrett77_mul32 | 1114 | 0 barrett79_mul32 | 0 | 1153 barrett87_mul32 | 1066 | 0 barrett88_mul32 | 1069 | 0 barrett92_mul32 | 0 | 1084 75bit_mul32_gs | 997 | 1423 95bit_mul32_gs | 999 | 1598 barrett76_mul32_gs | 1079 | 0 barrett77_mul32_gs | 1096 | 0 barrett79_mul32_gs | 0 | 1130 barrett87_mul32_gs | 1044 | 0 barrett88_mul32_gs | 1047 | 0 barrett92_mul32_gs | 0 | 1062 selftest FAILED! random selftest offset was: 5153388 [/CODE] |
Hi Ralf,
while you wrote this I was able to reproduce it on my system, too (GTX 980, CUDA 7.0, mfaktc 0.22-pre2). I see [B]exactly[/B] the same numbers of failed and passed selftests (execpt for 71bit_mul24 which is obvious because this kernel is removed in 0.22), so at least the issue is easy to reproduce (and static). -- edit -- [CODE]#define DEBUG_GPU_MATH[/CODE] doesn't show anything... [I]"interesting"[/I] -- edit -- Oliver |
Hi,
did some tests with the barrett79 kernel... seems that inside the main loop the differences are, mostly integer here. Comparing PTX output of CUDA 6.5 vs. 7.0 isn't fun at all... Oliver |
Testcase: M332195503 from 2[SUP]64[/SUP] to 2[SUP]65[/SUP] (there is a known factor in this range 23099992436515618207), hacked the code to for barrett79 usage,
Left side: CUDA 6.5 Right side: CUDA 7.0 [CODE]Mooh Mooh u = 0xCC6E77DC 0718B873 DABCC754 u = 0xCC6E77DC 0718B873 DABCC754 main loop start main loop start tmp96 = 0x00000000 4B0C159B 8DA668D7 tmp96 = 0x00000000 4B0C159B 8DA668D7 a = 0x00000000 4B0C159B 8DA668D7 | a = 0x00000000 4B0C159[B][COLOR="Red"]A[/COLOR][/B] 8DA668D7 b = 0x00000000 1600153B 2D67ACC9 F13EF602 F7C36491 | b = 0x00000000 1600153A 974F8193 D5F22454 F7C36491 [/CODE] [B][COLOR="Red"]WTF?[/COLOR][/B] Srsly? Reminds me [URL="http://mersenneforum.org/showthread.php?p=306728&highlight=carry#post306728"]this[/URL]... Oliver |
OK, indeed a bug with CUDA 7.0 (and/or drivers).
In my latest development version runs a small check on this. CUDA 6.5 + 346.72: [CODE]./mfaktc.exe -v 2 mfaktc v0.22-pre3 (64bit built) [...] CUDA version info binary compiled for CUDA 6.50 CUDA runtime version 6.50 CUDA driver version 7.0 [...] check_subcc_bug() input: mystuff->h_RES[2..0] = 0x33333333 22222222 11111111 output: mystuff->h_RES[5..3] = 0x33333333 22222222 11111111 passed, output == input [...][/CODE] CUDA 7.0 + 346.72: [CODE]./mfaktc.exe mfaktc v0.22-pre3 (64bit built) [...] CUDA version info binary compiled for CUDA 7.0 CUDA runtime version 7.0 CUDA driver version 7.0 [...] check_subcc_bug() input: mystuff->h_RES[2..0] = 0x33333333 22222222 11111111 output: mystuff->h_RES[5..3] = 0x33333333 22222221 11111111 ERROR: output != input could be caused by bad software environment (CUDA toolkit and/or graphics driver) Known bad: - CUDA 5.0.7RC + 302.06.03 with all supported GPUs fixed by driver update after reported this issue to nvidia - CUDA 7.0 + 346.47, 346.59, 346.72 and 349.16 346.72 with Maxwell GPUs [...] [/CODE] [I]check_subcc_bug()[/I] is silent unless[LIST][*]verbosity is greater or equal 2[*]sub.cc bug detected[/LIST] Oliver |
So, Oliver, what do we do? What should we avoid? Who should avoid what?
Do we wait for you to fix it? |
[QUOTE=firejuggler;402552]So, Oliver, what do we do? What should we avoid? Who should avoid what?[/QUOTE]
So it affects CUDA 7.0 + Maxwell class GPUs, just don't use this combination (if you try to do so the builtin selftest will deny productive usage). Right now I see no benefit of CUDA 7.0 over 6.5. [QUOTE=firejuggler;402552]Do we wait for you to fix it?[/QUOTE] Unless nvidia employs me and I learn how to write graphics drivers -> no (nvidia needs to fix!) Oliver |
| All times are UTC. The time now is 23:12. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.