mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

flashjh 2015-04-10 14:34

It looks like you may be building with code for another card. You need to make sure the makefile has the proper "arch=" and "code=" lines.

[STRIKE]What video card are you using? This will determine what you should have there.[/STRIKE]

Duh, it was in your post.

Make sure you have --generate-code arch=compute_50,code=sm_50

You could also use compute_52,code=sm_52 as that will work with a 980.

TheJudger 2015-04-16 22:17

Hi (and sorry for late reply),


[QUOTE=preda;399753]After compiling 0.21 from source with Cuda toolkit 7.0, I consistently get this self-test failure (always the same number of tests passed/failed at the end). Any hints are appreciated, thanks.

[CODE]mfaktc v0.21 (64bit built)
[...]
GPU Sieving enabled
[...]
[/CODE][/QUOTE]

to debug this issue can you disable GPU sieving (mfaktc.ini) and rerun?

Which CUDA version?
Which driver version?
Any chance to test CUDA toolkit 6.5?

Oliver

Ralf Recker 2015-05-09 13:31

[QUOTE=TheJudger;400251]to debug this issue can you disable GPU sieving (mfaktc.ini) and rerun?

Which CUDA version?
Which driver version?
Any chance to test CUDA toolkit 6.5?

Oliver[/QUOTE]
I get the same error here (mfaktc 0.20 and 0.21/CUDA toolkit 7.0/Driver versions 346.47, 346.59 and 349.16/Debian 8 and CentOS 7)
on a Maxwell card (GTX 970/compute_52/sm_52) but [B]not[/B] on a Kepler card (GTX 650/compute_30/sm_30).

Disabling GPU sieving doesn't help.

- A compiled binary of mfaktc 0.20 (CUDA toolkit 6.5/Debian 7) worked without problems on the GTX 970 (compute_52/sm_52).
- Compiled binaries of mfaktc 0.20 and 0.21 (CUDA toolkit 6.5/CentOS 7) work without problems on the GTX 970 (compute_52/sm_52).

- The binary from mersenne.ca (downloading from [URL="http://www.mersenneforum.org"]www.mersenneforum.org[/URL] is blocked) works without problems.

TheJudger 2015-05-15 21:33

Hi Ralf,

can you the problematic binary with "-st" (just a few seconds) and tell me whether it fails for specific kernels or does it fail all/"random"?

Oliver

Ralf Recker 2015-05-16 09:45

[QUOTE=TheJudger;402368]Hi Ralf,

can you the problematic binary with "-st" (just a few seconds) and tell me whether it fails for specific kernels or does it fail all/"random"?

Oliver[/QUOTE]

Here is the partial output from a -st run of mfaktc 0.21 on a GTX 970 (Debian 8.0/CUDA Toolkit 7.0).

[CODE]
Selftest statistics
number of tests 26192
successfull tests 15238
no factor found 10954

kernel | success | fail
-------------------+---------+-------
UNKNOWN kernel | 0 | 0
71bit_mul24 | 2586 | 0
75bit_mul32 | 1021 | 1661
95bit_mul32 | 1024 | 1843
barrett76_mul32 | 1096 | 0
barrett77_mul32 | 1114 | 0
barrett79_mul32 | 0 | 1153
barrett87_mul32 | 1066 | 0
barrett88_mul32 | 1069 | 0
barrett92_mul32 | 0 | 1084
75bit_mul32_gs | 997 | 1423
95bit_mul32_gs | 999 | 1598
barrett76_mul32_gs | 1079 | 0
barrett77_mul32_gs | 1096 | 0
barrett79_mul32_gs | 0 | 1130
barrett87_mul32_gs | 1044 | 0
barrett88_mul32_gs | 1047 | 0
barrett92_mul32_gs | 0 | 1062

selftest FAILED!
random selftest offset was: 9507477
[/CODE]Additional Makefile target for compute_52/sm_52

[CODE]
Selftest statistics
number of tests 26192
successfull tests 15238
no factor found 10954

kernel | success | fail
-------------------+---------+-------
UNKNOWN kernel | 0 | 0
71bit_mul24 | 2586 | 0
75bit_mul32 | 1021 | 1661
95bit_mul32 | 1024 | 1843
barrett76_mul32 | 1096 | 0
barrett77_mul32 | 1114 | 0
barrett79_mul32 | 0 | 1153
barrett87_mul32 | 1066 | 0
barrett88_mul32 | 1069 | 0
barrett92_mul32 | 0 | 1084
75bit_mul32_gs | 997 | 1423
95bit_mul32_gs | 999 | 1598
barrett76_mul32_gs | 1079 | 0
barrett77_mul32_gs | 1096 | 0
barrett79_mul32_gs | 0 | 1130
barrett87_mul32_gs | 1044 | 0
barrett88_mul32_gs | 1047 | 0
barrett92_mul32_gs | 0 | 1062

selftest FAILED!
random selftest offset was: 5153388
[/CODE]

TheJudger 2015-05-16 10:24

Hi Ralf,

while you wrote this I was able to reproduce it on my system, too (GTX 980, CUDA 7.0, mfaktc 0.22-pre2).
I see [B]exactly[/B] the same numbers of failed and passed selftests (execpt for 71bit_mul24 which is obvious because this kernel is removed in 0.22), so at least the issue is easy to reproduce (and static).

-- edit --
[CODE]#define DEBUG_GPU_MATH[/CODE] doesn't show anything... [I]"interesting"[/I]
-- edit --

Oliver

TheJudger 2015-05-17 18:51

Hi,

did some tests with the barrett79 kernel... seems that inside the main loop the differences are, mostly integer here.
Comparing PTX output of CUDA 6.5 vs. 7.0 isn't fun at all...

Oliver

TheJudger 2015-05-17 19:03

Testcase: M332195503 from 2[SUP]64[/SUP] to 2[SUP]65[/SUP] (there is a known factor in this range 23099992436515618207), hacked the code to for barrett79 usage,
Left side: CUDA 6.5
Right side: CUDA 7.0
[CODE]Mooh Mooh
u = 0xCC6E77DC 0718B873 DABCC754 u = 0xCC6E77DC 0718B873 DABCC754
main loop start main loop start
tmp96 = 0x00000000 4B0C159B 8DA668D7 tmp96 = 0x00000000 4B0C159B 8DA668D7
a = 0x00000000 4B0C159B 8DA668D7 | a = 0x00000000 4B0C159[B][COLOR="Red"]A[/COLOR][/B] 8DA668D7
b = 0x00000000 1600153B 2D67ACC9 F13EF602 F7C36491 | b = 0x00000000 1600153A 974F8193 D5F22454 F7C36491
[/CODE]
[B][COLOR="Red"]WTF?[/COLOR][/B] Srsly? Reminds me [URL="http://mersenneforum.org/showthread.php?p=306728&highlight=carry#post306728"]this[/URL]...

Oliver

TheJudger 2015-05-18 17:40

OK, indeed a bug with CUDA 7.0 (and/or drivers).

In my latest development version runs a small check on this.

CUDA 6.5 + 346.72:
[CODE]./mfaktc.exe -v 2
mfaktc v0.22-pre3 (64bit built)
[...]
CUDA version info
binary compiled for CUDA 6.50
CUDA runtime version 6.50
CUDA driver version 7.0
[...]
check_subcc_bug()
input: mystuff->h_RES[2..0] = 0x33333333 22222222 11111111
output: mystuff->h_RES[5..3] = 0x33333333 22222222 11111111
passed, output == input
[...][/CODE]

CUDA 7.0 + 346.72:
[CODE]./mfaktc.exe
mfaktc v0.22-pre3 (64bit built)
[...]
CUDA version info
binary compiled for CUDA 7.0
CUDA runtime version 7.0
CUDA driver version 7.0
[...]
check_subcc_bug()
input: mystuff->h_RES[2..0] = 0x33333333 22222222 11111111
output: mystuff->h_RES[5..3] = 0x33333333 22222221 11111111
ERROR: output != input

could be caused by bad software environment (CUDA toolkit and/or graphics driver)
Known bad:
- CUDA 5.0.7RC + 302.06.03 with all supported GPUs
fixed by driver update after reported this issue to nvidia
- CUDA 7.0 + 346.47, 346.59, 346.72 and 349.16 346.72 with Maxwell GPUs
[...]
[/CODE]

[I]check_subcc_bug()[/I] is silent unless[LIST][*]verbosity is greater or equal 2[*]sub.cc bug detected[/LIST]
Oliver

firejuggler 2015-05-18 18:17

So, Oliver, what do we do? What should we avoid? Who should avoid what?
Do we wait for you to fix it?

TheJudger 2015-05-18 18:33

[QUOTE=firejuggler;402552]So, Oliver, what do we do? What should we avoid? Who should avoid what?[/QUOTE]
So it affects CUDA 7.0 + Maxwell class GPUs, just don't use this combination (if you try to do so the builtin selftest will deny productive usage). Right now I see no benefit of CUDA 7.0 over 6.5.

[QUOTE=firejuggler;402552]Do we wait for you to fix it?[/QUOTE]
Unless nvidia employs me and I learn how to write graphics drivers -> no (nvidia needs to fix!)

Oliver


All times are UTC. The time now is 23:12.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.