mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

Rodrigo 2015-02-21 05:32

1 Attachment(s)
[QUOTE=kladner;395921]Norton Internet Security gave the same complaint. I believe that such flags are based on the application being unknown in the Norton Community database. There are no direct heuristics indicating malware aside from the file having very restricted distribution.[/QUOTE]
Huh, now this is really strange. MFAKTC gets the seal of approval from Norton 360's "File Insight" feature (see attachment), while N360 itself has never flagged it on my PC during a scan.

So NPE and NIS dislike it, while N360 and File Insight are OK with it.

Maybe these various Symantec applications are maintained by rival teams... :smile:

Rodrigo

kladner 2015-02-21 05:53

[QUOTE=Rodrigo;395967]Huh, now this is really strange. MFAKTC gets the seal of approval from Norton 360's "File Insight" feature (see attachment), while N360 itself has never flagged it on my PC during a scan.

So NPE and NIS dislike it, while N360 and File Insight are OK with it.

Maybe these various Symantec applications are maintained by rival teams... :smile:

Rodrigo[/QUOTE]

I only had problems with the 0.21 files. In the first instance, it gave dire warnings, but allowed me to authorize their use. The next time (I was testing 32bit vs 64bit), it horned in and quarantined the file. Through dogged insistence I managed to make NIS disgorge its prey. I played this game a few times before I beat down Norton's resistance.

swl551 2015-03-02 01:16

Problem processing worktodo.add
 
I'm working on adding support for worktodo.add to MISFIT and I came across a problem with mfaktc

During startup of mfaktc if WorkToDo.txt has no rows the program exits instead of proactively inbounding rows from WorkToDo.add

So if workToDo.txt runs dry it is impossible to get it mfaktc restarted without first manually moving data out of the .Add file.

I think during startup mfaktc should check for the .add file and process it if it exists.


Scott

TheMawn 2015-03-02 01:34

[QUOTE=swl551;396777]I'm working on adding support for worktodo.add to MISFIT and I came across a problem with mfaktc

During startup of mfaktc if WorkToDo.txt has no rows the program exits instead of proactively inbounding rows from WorkToDo.add

So if workToDo.txt runs dry it is impossible to get it mfaktc restarted without first manually moving data out of the .Add file.

I think during startup mfaktc should check for the .add file and process it if it exists.


Scott[/QUOTE]

Does the program not have an emergency dump-from-staging-file routine?

swl551 2015-03-02 13:07

[QUOTE=TheMawn;396778]Does the program not have an emergency dump-from-staging-file routine?[/QUOTE]

The Judger would have to answer, but it appears it does not read from the .add file in an "Emergency"

TheJudger 2015-03-02 17:38

Hi Scott (other aswell)!
[LIST][*]Add worktodo.add always to worktodo.txt on startup, yes, why not (read: good idea, I'll do this in the next release)[*]Add worktodo.add to worktodo.txt on [I]"emergency"[/I]? What is an [I]"emergency"[/I]? Processed everything from worktodo.txt? Well, I don't feel comfortable with "add worktodo.add to worktodo.txt" in that case, this will break the whole idea of worktodo.add. Imagin only one exponent left in worktodo.txt and StopAfterFactor=2 (mfaktc.ini), while you edit worktodo.add a factor is found... Same as editing worktodo.txt, isn't it?[/LIST]
Oliver

TheMawn 2015-03-02 18:33

[QUOTE=swl551;396788]The Judger would have to answer, but it appears it does not read from the .add file in an "Emergency"[/QUOTE]

No, I was talking about Misfit. Does it not dump whatever is in the staging file if worktodo.txt falls below a certain threshold?

swl551 2015-03-02 19:43

[QUOTE=TheMawn;396819]No, I was talking about Misfit. Does it not dump whatever is in the staging file if worktodo.txt falls below a certain threshold?[/QUOTE]

I am working to implement support for .add where misfit will not load directly into live work files.

TheMawn 2015-03-03 01:34

When MISFIT detects a stalled instance, would it be appropriate for it to try to determine if the worktodo.txt is empty, and if it is, automatically transfer work (possibly from worktodo.add) and automatically attempt to restart the instance?

This is assuming the control codes are correctly jiggered.

EDIT: I make this suggestion assuming the lack of any better way for MISFIT to determine if that is the reason mfaktx died.

EDIT: Again I don't know if this is the kind of functionality we want on the MISFIT level. Perhaps this is something better off being fixed at the mfaktx level (i.e. first thing when the program is run, merge worktodo.add with worktodo.txt)

swl551 2015-03-03 02:17

[QUOTE=TheMawn;396849]When MISFIT detects a stalled instance, would it be appropriate for it to try to determine if the worktodo.txt is empty, and if it is, automatically transfer work (possibly from worktodo.add) and automatically attempt to restart the instance?

This is assuming the control codes are correctly jiggered.

EDIT: I make this suggestion assuming the lack of any better way for MISFIT to determine if that is the reason mfaktx died.

EDIT: Again I don't know if this is the kind of functionality we want on the MISFIT level. Perhaps this is something better off being fixed at the mfaktx level (i.e. first thing when the program is run, merge worktodo.add with worktodo.txt)[/QUOTE]


A primary goal of MISFIT is to never allow your installations to run out of work. It is possible to configure MISFIT and let everything run for months without human intervention (wait... install those windows patches every month!) If work is running out you have not configured MISFIT "correctly" -- use the work calculator.


As for restarting stalled instances.... The ONLY time I have had instances stall is due to overclocking and upon restart I found that the clock speed is always a paltry 420mhz. If MISFIT restarted the instance instead of alarming you could be running crippled and not know it. Also if you have lots of stalls you have something misconfigured with your card or a defective card. It is possible for MISFIT to do more than it does, but coding is a lot of work......

TheMawn 2015-03-03 03:26

Fair enough. To be clear, I am having no issues at all. Haven't had to muck around with the GPUs in weeks. I just saw you mention you were working with MISFIT when you encountered the issue where the empty worktodo.txt would prevent mfaktx from ever running and I was just bouncing ideas around in case you were actively trying to add some functionality to MISFIT to deal with that situation.

It's perfectly fine that you don't. In fact in my (very limited) coding experience, I find the "niche" cases to not be worth dealing with.

vsuite 2015-04-03 17:32

What is the advantage of mfaktc .21 over .20 please?

What is the advantage of 6.5 CUDA over 4.2?

Using a GTX 460 running 7.0 (compute capability 2.1) and a 640 running 6.0 (compute capability 3.0).

Thanks.

James Heinrich 2015-04-03 18:05

[QUOTE=vsuite;399291]What is the advantage of mfaktc .21 over .20 please?[/QUOTE][code]version 0.21 (2015-02-17)
- added support for Wagstaff numbers: (2^p + 1)/3
- added support for "worktodo.add"
- enabled GPU sieving on CC 1.x GPUs
- dropped lower limit for exponents from 1,000,000 to 100,000
- rework selftest (-st and -st2), both now test ALL testcases, -st narrowed the searchspace (k_min < k_factor < k_max) to speedup the selftest.
- added random offset for selftest, this might detect bugs in sieve code which a static offset wouldn't find because we always test the same value.
- fixed a bug where mfaktc runs out of shared memory (GPU sieve), might be the cause for some reported (but never reproduced?) crashes. This occurs when you
- have a GPU with relative small amount of shared memory
- have a LOW value for GPUSievePrimes
- have a BIG value for GPUSieveSize
- fixed a bug when GPUSieveProcessSize is set to 24 AND GPUSieveSize is not a multiple of 3 there was a relative small chance to ignore a factor.
- fixed a bug in SievePrimesAdjust causing SievePrimes where lowered to SievePrimesMin for very short running jobs
- added missing dependencies to Windows Makefiles
- (possible) speedups
- funnel shift for CC 3.5 and above
- slightly faster integer division for barrett_76,77,79 kernels
- lots of cleanups and removal of duplicate code
- print per-kernel-stats for selftest "-st" and "-st2"[/code]

Mark Rose 2015-04-03 18:48

[QUOTE=James Heinrich;399293][code] - slightly faster integer division for barrett_76,77,79 kernels[/code][/QUOTE]

That works out to about 1.5% ± 0.5% from what I've seen.

James Heinrich 2015-04-03 20:28

[QUOTE=James Heinrich;399293]- enabled GPU sieving on CC 1.x GPUs
- dropped lower limit for exponents from 1,000,000 to 100,000[/QUOTE]The major feature addition for me was GPU sieving below 2[sup]64[/sup], but that didn't make it to the changelog for some reason.

vsuite 2015-04-03 20:55

Thanks much.

My XP Core 2 Quad reports the .21 win32 app as not a valid Win32 application. [Also the 5.5, 6.0 and 6.5 CudaLucas, but not the 5.0 or 4.2 CudaLucas.] It should not be filesize.

James Heinrich 2015-04-03 21:04

Download again?
[url]http://download.mersenne.ca/mfaktc/mfaktc-0.21/mfaktc-0.21.win.cuda65.zip[/url]

vsuite 2015-04-03 21:14

Thanks again.

preda 2015-04-09 15:08

mkfaktc 0.21 selftest fails, cuda 7.0 on Linux
 
After compiling 0.21 from source with Cuda toolkit 7.0, I consistently get this self-test failure (always the same number of tests passed/failed at the end). Any hints are appreciated, thanks.

[CODE]mfaktc v0.21 (64bit built)

Compiletime options
THREADS_PER_BLOCK 256
SIEVE_SIZE_LIMIT 32kiB
SIEVE_SIZE 193154bits
SIEVE_SPLIT 250
MORE_CLASSES enabled

Runtime options
SievePrimes 25000
SievePrimesAdjust 1
SievePrimesMin 5000
SievePrimesMax 100000
NumStreams 3
CPUStreams 3
GridSize 3
GPU Sieving enabled
GPUSievePrimes 82486
GPUSieveSize 64Mi bits
GPUSieveProcessSize 16Ki bits
Checkpoints enabled
CheckpointDelay 30s
WorkFileAddDelay 600s
Stages enabled
StopAfterFactor bitlevel
PrintMode full
V5UserID (none)
ComputerID (none)
AllowSleep no
TimeStampInResults no

CUDA version info
binary compiled for CUDA 7.0
CUDA runtime version 7.0
CUDA driver version 7.0

CUDA device info
name GeForce GTX 980
compute capability 5.2
max threads per block 1024
max shared memory per MP 98304 byte
number of multiprocessors 16
CUDA cores per MP 128
CUDA cores - total 2048
clock rate (CUDA cores) 1215MHz
memory clock rate: 3505MHz
memory bus width: 256 bit

Automatic parameters
threads per grid 1048576
GPUSievePrimes (adjusted) 82486
GPUsieve minimum exponent 1055144

running a simple selftest...
ERROR: selftest failed for M49635893
no factor found
ERROR: selftest failed for M49635893
no factor found
ERROR: selftest failed for M49635893
no factor found
ERROR: selftest failed for M49635893
no factor found
ERROR: selftest failed for M49635893
no factor found
ERROR: selftest failed for M49635893
no factor found
ERROR: selftest failed for M49635893
no factor found
ERROR: selftest failed for M49635893
no factor found
ERROR: selftest failed for M51375383
no factor found
ERROR: selftest failed for M51375383
no factor found
ERROR: selftest failed for M51375383
no factor found
ERROR: selftest failed for M51375383
no factor found
ERROR: selftest failed for M51375383
no factor found
ERROR: selftest failed for M51375383
no factor found
ERROR: selftest failed for M51375383
no factor found
ERROR: selftest failed for M51375383
no factor found
ERROR: selftest failed for M47644171
no factor found
ERROR: selftest failed for M47644171
no factor found
ERROR: selftest failed for M47644171
no factor found
ERROR: selftest failed for M47644171
no factor found
ERROR: selftest failed for M47644171
no factor found
ERROR: selftest failed for M47644171
no factor found
ERROR: selftest failed for M47644171
no factor found
ERROR: selftest failed for M47644171
no factor found
ERROR: selftest failed for M51038681
no factor found
ERROR: selftest failed for M51038681
no factor found
ERROR: selftest failed for M51038681
no factor found
ERROR: selftest failed for M51038681
no factor found
ERROR: selftest failed for M51038681
no factor found
ERROR: selftest failed for M51038681
no factor found
ERROR: selftest failed for M51038681
no factor found
ERROR: selftest failed for M51038681
no factor found
ERROR: selftest failed for M53076719
no factor found
ERROR: selftest failed for M53076719
no factor found
ERROR: selftest failed for M53076719
no factor found
ERROR: selftest failed for M53076719
no factor found
ERROR: selftest failed for M53076719
no factor found
ERROR: selftest failed for M53076719
no factor found
ERROR: selftest failed for M53076719
no factor found
ERROR: selftest failed for M53076719
no factor found
ERROR: selftest failed for M53123843
no factor found
ERROR: selftest failed for M53123843
no factor found
ERROR: selftest failed for M53123843
no factor found
ERROR: selftest failed for M53123843
no factor found
ERROR: selftest failed for M53123843
no factor found
ERROR: selftest failed for M53123843
no factor found
ERROR: selftest failed for M53123843
no factor found
ERROR: selftest failed for M53123843
no factor found
ERROR: selftest failed for M3321928703
no factor found
ERROR: selftest failed for M3321928703
no factor found
ERROR: selftest failed for M3321928703
no factor found
ERROR: selftest failed for M3321928703
no factor found
ERROR: selftest failed for M3321931973
no factor found
ERROR: selftest failed for M3321931973
no factor found
ERROR: selftest failed for M3321928619
no factor found
ERROR: selftest failed for M3321928619
no factor found
Selftest statistics
number of tests 107
successfull tests 51
no factor found 56

selftest FAILED!
random selftest offset was: 12734519[/CODE]

flashjh 2015-04-10 14:34

It looks like you may be building with code for another card. You need to make sure the makefile has the proper "arch=" and "code=" lines.

[STRIKE]What video card are you using? This will determine what you should have there.[/STRIKE]

Duh, it was in your post.

Make sure you have --generate-code arch=compute_50,code=sm_50

You could also use compute_52,code=sm_52 as that will work with a 980.

TheJudger 2015-04-16 22:17

Hi (and sorry for late reply),


[QUOTE=preda;399753]After compiling 0.21 from source with Cuda toolkit 7.0, I consistently get this self-test failure (always the same number of tests passed/failed at the end). Any hints are appreciated, thanks.

[CODE]mfaktc v0.21 (64bit built)
[...]
GPU Sieving enabled
[...]
[/CODE][/QUOTE]

to debug this issue can you disable GPU sieving (mfaktc.ini) and rerun?

Which CUDA version?
Which driver version?
Any chance to test CUDA toolkit 6.5?

Oliver

Ralf Recker 2015-05-09 13:31

[QUOTE=TheJudger;400251]to debug this issue can you disable GPU sieving (mfaktc.ini) and rerun?

Which CUDA version?
Which driver version?
Any chance to test CUDA toolkit 6.5?

Oliver[/QUOTE]
I get the same error here (mfaktc 0.20 and 0.21/CUDA toolkit 7.0/Driver versions 346.47, 346.59 and 349.16/Debian 8 and CentOS 7)
on a Maxwell card (GTX 970/compute_52/sm_52) but [B]not[/B] on a Kepler card (GTX 650/compute_30/sm_30).

Disabling GPU sieving doesn't help.

- A compiled binary of mfaktc 0.20 (CUDA toolkit 6.5/Debian 7) worked without problems on the GTX 970 (compute_52/sm_52).
- Compiled binaries of mfaktc 0.20 and 0.21 (CUDA toolkit 6.5/CentOS 7) work without problems on the GTX 970 (compute_52/sm_52).

- The binary from mersenne.ca (downloading from [URL="http://www.mersenneforum.org"]www.mersenneforum.org[/URL] is blocked) works without problems.

TheJudger 2015-05-15 21:33

Hi Ralf,

can you the problematic binary with "-st" (just a few seconds) and tell me whether it fails for specific kernels or does it fail all/"random"?

Oliver

Ralf Recker 2015-05-16 09:45

[QUOTE=TheJudger;402368]Hi Ralf,

can you the problematic binary with "-st" (just a few seconds) and tell me whether it fails for specific kernels or does it fail all/"random"?

Oliver[/QUOTE]

Here is the partial output from a -st run of mfaktc 0.21 on a GTX 970 (Debian 8.0/CUDA Toolkit 7.0).

[CODE]
Selftest statistics
number of tests 26192
successfull tests 15238
no factor found 10954

kernel | success | fail
-------------------+---------+-------
UNKNOWN kernel | 0 | 0
71bit_mul24 | 2586 | 0
75bit_mul32 | 1021 | 1661
95bit_mul32 | 1024 | 1843
barrett76_mul32 | 1096 | 0
barrett77_mul32 | 1114 | 0
barrett79_mul32 | 0 | 1153
barrett87_mul32 | 1066 | 0
barrett88_mul32 | 1069 | 0
barrett92_mul32 | 0 | 1084
75bit_mul32_gs | 997 | 1423
95bit_mul32_gs | 999 | 1598
barrett76_mul32_gs | 1079 | 0
barrett77_mul32_gs | 1096 | 0
barrett79_mul32_gs | 0 | 1130
barrett87_mul32_gs | 1044 | 0
barrett88_mul32_gs | 1047 | 0
barrett92_mul32_gs | 0 | 1062

selftest FAILED!
random selftest offset was: 9507477
[/CODE]Additional Makefile target for compute_52/sm_52

[CODE]
Selftest statistics
number of tests 26192
successfull tests 15238
no factor found 10954

kernel | success | fail
-------------------+---------+-------
UNKNOWN kernel | 0 | 0
71bit_mul24 | 2586 | 0
75bit_mul32 | 1021 | 1661
95bit_mul32 | 1024 | 1843
barrett76_mul32 | 1096 | 0
barrett77_mul32 | 1114 | 0
barrett79_mul32 | 0 | 1153
barrett87_mul32 | 1066 | 0
barrett88_mul32 | 1069 | 0
barrett92_mul32 | 0 | 1084
75bit_mul32_gs | 997 | 1423
95bit_mul32_gs | 999 | 1598
barrett76_mul32_gs | 1079 | 0
barrett77_mul32_gs | 1096 | 0
barrett79_mul32_gs | 0 | 1130
barrett87_mul32_gs | 1044 | 0
barrett88_mul32_gs | 1047 | 0
barrett92_mul32_gs | 0 | 1062

selftest FAILED!
random selftest offset was: 5153388
[/CODE]

TheJudger 2015-05-16 10:24

Hi Ralf,

while you wrote this I was able to reproduce it on my system, too (GTX 980, CUDA 7.0, mfaktc 0.22-pre2).
I see [B]exactly[/B] the same numbers of failed and passed selftests (execpt for 71bit_mul24 which is obvious because this kernel is removed in 0.22), so at least the issue is easy to reproduce (and static).

-- edit --
[CODE]#define DEBUG_GPU_MATH[/CODE] doesn't show anything... [I]"interesting"[/I]
-- edit --

Oliver

TheJudger 2015-05-17 18:51

Hi,

did some tests with the barrett79 kernel... seems that inside the main loop the differences are, mostly integer here.
Comparing PTX output of CUDA 6.5 vs. 7.0 isn't fun at all...

Oliver

TheJudger 2015-05-17 19:03

Testcase: M332195503 from 2[SUP]64[/SUP] to 2[SUP]65[/SUP] (there is a known factor in this range 23099992436515618207), hacked the code to for barrett79 usage,
Left side: CUDA 6.5
Right side: CUDA 7.0
[CODE]Mooh Mooh
u = 0xCC6E77DC 0718B873 DABCC754 u = 0xCC6E77DC 0718B873 DABCC754
main loop start main loop start
tmp96 = 0x00000000 4B0C159B 8DA668D7 tmp96 = 0x00000000 4B0C159B 8DA668D7
a = 0x00000000 4B0C159B 8DA668D7 | a = 0x00000000 4B0C159[B][COLOR="Red"]A[/COLOR][/B] 8DA668D7
b = 0x00000000 1600153B 2D67ACC9 F13EF602 F7C36491 | b = 0x00000000 1600153A 974F8193 D5F22454 F7C36491
[/CODE]
[B][COLOR="Red"]WTF?[/COLOR][/B] Srsly? Reminds me [URL="http://mersenneforum.org/showthread.php?p=306728&highlight=carry#post306728"]this[/URL]...

Oliver

TheJudger 2015-05-18 17:40

OK, indeed a bug with CUDA 7.0 (and/or drivers).

In my latest development version runs a small check on this.

CUDA 6.5 + 346.72:
[CODE]./mfaktc.exe -v 2
mfaktc v0.22-pre3 (64bit built)
[...]
CUDA version info
binary compiled for CUDA 6.50
CUDA runtime version 6.50
CUDA driver version 7.0
[...]
check_subcc_bug()
input: mystuff->h_RES[2..0] = 0x33333333 22222222 11111111
output: mystuff->h_RES[5..3] = 0x33333333 22222222 11111111
passed, output == input
[...][/CODE]

CUDA 7.0 + 346.72:
[CODE]./mfaktc.exe
mfaktc v0.22-pre3 (64bit built)
[...]
CUDA version info
binary compiled for CUDA 7.0
CUDA runtime version 7.0
CUDA driver version 7.0
[...]
check_subcc_bug()
input: mystuff->h_RES[2..0] = 0x33333333 22222222 11111111
output: mystuff->h_RES[5..3] = 0x33333333 22222221 11111111
ERROR: output != input

could be caused by bad software environment (CUDA toolkit and/or graphics driver)
Known bad:
- CUDA 5.0.7RC + 302.06.03 with all supported GPUs
fixed by driver update after reported this issue to nvidia
- CUDA 7.0 + 346.47, 346.59, 346.72 and 349.16 346.72 with Maxwell GPUs
[...]
[/CODE]

[I]check_subcc_bug()[/I] is silent unless[LIST][*]verbosity is greater or equal 2[*]sub.cc bug detected[/LIST]
Oliver

firejuggler 2015-05-18 18:17

So, Oliver, what do we do? What should we avoid? Who should avoid what?
Do we wait for you to fix it?

TheJudger 2015-05-18 18:33

[QUOTE=firejuggler;402552]So, Oliver, what do we do? What should we avoid? Who should avoid what?[/QUOTE]
So it affects CUDA 7.0 + Maxwell class GPUs, just don't use this combination (if you try to do so the builtin selftest will deny productive usage). Right now I see no benefit of CUDA 7.0 over 6.5.

[QUOTE=firejuggler;402552]Do we wait for you to fix it?[/QUOTE]
Unless nvidia employs me and I learn how to write graphics drivers -> no (nvidia needs to fix!)

Oliver

firejuggler 2015-05-18 21:04

so, avoid 960/970/980/titan and 7.0 cuda drivers, got it.

TheJudger 2015-05-18 21:14

[QUOTE=firejuggler;402575]so, avoid 960/970/980/titan and 7.0 cuda drivers, got it.[/QUOTE]

CUDA 7.0 capable drivers seem to be OK, mfaktc binary compiled with [B]CUDA Toolkit 7.0[/B] triggers the bug (inside the driver?)
At least for Titan X it will be impossible(?) to find a driver not capable of CUDA 7.0.

To make a long story short: use CUDA 6.5 binaries and you don't have to care about this bug.

Oliver

LaurV 2015-05-19 03:00

[QUOTE=firejuggler;402575]so, avoid 960/970/980/titan [COLOR=Red]X[/COLOR] and 7.0 cuda drivers, got it.[/QUOTE]
FTFY. The plain titans and blacks are ok (in fact, my titan is faster with 5.5, no reason to use 7.0)
[edit: just to clarify, titan Z should also be ok, as it is in fact two plain titans fuzed together]

TheJudger 2015-05-19 15:46

GTX 960/970/980 and Titan X are OK, too (indeed they are very nice cards!).
Just don't use CUDA Toolkit 7.0 when compiling mfaktc.

Oliver

preda 2015-05-21 14:48

subcc bug in cuda 7.0
 
This is a bug confirmed by Nvidia in CUDA Toolkit 7.0, they have a fix that will be released in the next toolkit release after 7.0.
The bug concerns sub with carry -- the carry being sometimes set when it shouldn't.
I suspect it may be wrong assembler generated by ptxas (as the PTX is correct).

The workaround is to compile with CUDA toolkit 6.5.

TheJudger 2015-05-21 22:51

where/when did they confirm? My bugreports last action was just
[QUOTE]Status changed from "Open - pending review" to "Open - in progress"[/QUOTE]

I'm not even sure whether this is a bug in the CUDA toolkit or the driver itself, last time they fixed it with an updated driver.

Oliver

preda 2015-05-22 09:42

subcc bug in cuda 7.0
 
My bug report to them is here: [url]https://developer.nvidia.com/nvbugs/cuda/edit/1642061[/url] but I suppose only I can view that bug (unfortunately).

Here's a brief on that bug report:

The code below generates wrong behavior for the multiword subtraction with borrow sub() routine, where a spurious borrow takes place. Basically:
{0, 0, 0, 0, 1, 0} - {0, 0, 0, 0, 0, 0} returns {0, 0xffffffff, 0xffffffff, 0xffffffff, 0, 0}.

Here is the source for reproduction: [url]http://pastebin.com/Ab7d8YAh[/url] compile with: nvcc -std=c++11 --gpu-architecture sm_50 bug.cu I ran it on GTX 980.

On that bug this is the last update from Nvidia:
"This issue has been fixed in our development versions, and the fix would be available for you in the next CUDA release that following of 7.0. Thanks again for the reporting!"

ixfd64 2015-05-24 22:16

There is usually some display lag when mfaktc is running. However, my computer is running a lot smoother now even though I did not change any mfaktc settings. Could this be related to the latest driver update? Anyone else seeing the same thing?

kladner 2015-05-24 22:22

[QUOTE=ixfd64;402917]There is usually some display lag when mfaktc is running. However, my computer is running a lot smoother now even though I did not change any mfaktc settings. Could this be related to the latest driver update? Anyone else seeing the same thing?[/QUOTE]

Lag still comes and goes for me, with driver 350.12. If I want the screen to be responsive enough for playing video or editing images, I restart the 580 that runs the display with something like 'GPUSieveSize=40'.

TheJudger 2015-07-08 19:02

subcc bug persists in CUDA 7.5 RC
 
CUDA 7.5 RC is available for registered developers.

Using [B]cuda_7.5.7_rc_linux.run[/B] (includes driver 352.07 and nvcc 7.5.6) the problem is [B][U]NOT fixed[/U][/B].

So for now:[LIST][*]don't use CUDA toolkit 7.0 for [B][U]compiling[/U][/B] mfaktc[*]don't use CUDA toolkit 7.5 RC for [B][U]compiling[/U][/B] mfaktc[/LIST]
I repeat myself: using a CUDA 7 capable [B]driver[/B] is okay for mfaktc!

Oliver

TheJudger 2015-07-30 20:27

Hi everyone,

just a little update: It is not only "the subcc"-bug, there are at least two issues with CUDA 7.0 and 7.5RC in regard to mfaktc.
Nvidia told me that the subcc stuff is fixed in their internal built and will be included in the final CUDA 7.5 package.
Nvidia told me that they have fixed the other issue(s) in their internal built, too. Unfortionally this fix won't be included in CUDA 7.5... They'll fix this in a "feature release".

So my previous [URL="http://www.mersenneforum.org/showpost.php?p=405539&postcount=2562"]post[/URL] is still valid and there is a high chance that CUDA 7.5 (final) needs to be added to the list. :sad:

The good news are that there is (right now) no need for CUDA 7.x in mfaktc:[LIST][*]no new GPUs supported by CUDA 7.x.[*]I didn't notice any stuff which might increase the performance of mfaktc (such as the "funnel shift" in CUDA 5.0)[/LIST]
Oliver

TheDomis 2015-08-07 12:07

Hi, can someone tell me why CUDA 6.5 mfaktc doesn't launch on Windows XP 32bit? It says it's an invalid Win32 application. I tried to redownload it but it still doesn't launch, the CUDA 4.2 version launches normally.

Specs:
AMD Athlon 64 3200+ Venice OCed to 2.3GHz
1.5GB RAM
GeForce GT 630

LaurV 2015-08-08 03:47

Missing libraries? Do you have cudart 6.5 dlls?
Try opening a command prompt and launch the program from there, which may allow you to see the real error (before the window disappears).

TheDomis 2015-08-08 15:59

[url]http://imgur.com/xYNZCbU[/url] This is what happens when I open it.
[url]http://imgur.com/UD0mELG[/url] When I close it it says access denied.

TheDomis 2015-08-16 18:56

[QUOTE=LaurV;407431]Missing libraries? Do you have cudart 6.5 dlls?
Try opening a command prompt and launch the program from there, which may allow you to see the real error (before the window disappears).[/QUOTE]

When I open it from the command prompt it says that it's not a valid win32 application, when I hit OK it says in the command prompt that access is denied.

Gordon 2015-08-16 19:00

[QUOTE=TheDomis;408101]When I open it from the command prompt it says that it's not a valid win32 application, when I hit OK it says in the command prompt that access is denied.[/QUOTE]

Perhaps need to run cmd.exe as administrator?

TheDomis 2015-08-16 19:03

OK I figured it out, apparently Win XP is unable to run VS2012 applications, it has to be compiled with VS2010 or .NET 4.0 I think. .NET 4.5 is unsupported in Win XP.

TheDomis 2015-08-16 22:10

1 Attachment(s)
If anyone has an old rig with WinXP but cannot run the latest version of mfaktc with CUDA 6.5, I have just compiled mfaktc-0.21 x86 CUDA 6.5 with VS2010 instead of VS2012, so it will run on WinXP. I couldn't compile a x64 version because this PC is running 32-bit Windows. Enjoy.

alpertron 2015-08-17 01:24

You can also create applications for Windows XP using Visual Studio 2013. Check this [URL="https://www.visualstudio.com/en-us/products/visual-studio-2013-compatibility-vs.aspx"]compatibility list for VS 2013[/URL].

TheDomis 2015-08-17 04:58

Yes, but you can't create or launch .NET 4.5 applications in Windows XP which I believe was the culprit.

vsuite 2015-08-25 03:44

[QUOTE=TheDomis;408103]OK I figured it out, apparently Win XP is unable to run VS2012 applications, it has to be compiled with VS2010 or .NET 4.0 I think. .NET 4.5 is unsupported in Win XP.[/QUOTE]

Thanks for solving it. I was getting that problem with WINXP and several of the win32 downloads (mfaktc and possibly also cudalucas).

flashjh 2015-09-17 00:06

[QUOTE=TheJudger;402624]GTX 960/970/980 and Titan X are OK, too (indeed they are very nice cards!).
Just don't use CUDA Toolkit 7.0 when compiling mfaktc.

Oliver[/QUOTE]

I installed a Titan Z and compiled a CUDA 7.5 version. Will the built-in self test check if everything is working now?

TheJudger 2015-09-17 08:09

Hi Jerry,

[QUOTE=flashjh;410558]I installed a Titan Z and compiled a CUDA 7.5 version. Will the built-in self test check if everything is working now?[/QUOTE]

please stay with CUDA [B][U]6.5[/U][/B]. CUDA 7.0 and CUDA 7.5 are broken for mfaktc (nvidia confirmed that it is a CUDA toolkit fault and will be fixed in future versions. Actually the fixed the subcc bug again in CUDA 7.5 but there are additional issues.).

On the other hand your Titan Z is a Kepler chip, the issues mentioned above affect "only" Maxwell generation.

The good news is that even the "simple selftest" executed each time on startup fails on Maxwell with CUDA 7.0/7.5.

Oliver

flashjh 2015-09-17 11:09

Thanks. The 7.5 version is faster, I just didn't want to use it with the Z without double checking with you. I'm going to build a few others and do some testing.

Also, though CUDA 7.5 (possibly lower) supports MSVS 2013, the programs do not run when compiled. 2012 does work. Any thoughts?

TheJudger 2015-09-17 19:18

Hi Jerry,

for all builts I recommend to run the long selftest (mfaktc.exe -st) on at least the target architecture. If it passes the long selftest on your Kepler card(s) I should be OK. But always keep in mind once you upgrade to a kepler card the binary will fail.
I haven't spent much time on testing mfaktc with CUDA 7.0/7.5 because of Maxwell issues.

Do you have more details than "doesn't work" in regard to MSVS 2013?

Oliver

flashjh 2015-09-18 00:25

[QUOTE=TheJudger;410672]Do you have more details than "doesn't work" in regard to MSVS 2013?

Oliver[/QUOTE]

Sorry Oliver, I knew better than that! Here is the output of a self test. This happens after building with MSVS 2013 on the 580 and the Titan Z
[CODE]########## testcase 1/2867 ##########
Starting trial factoring M50804297 from 2^67 to 2^68 (0.59 GHz-days)
Using GPU kernel "75bit_mul32_gs"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Sep 17 18:40 | 3387 0.1% | 0.001 n.a. | n.a. 82485 n.a.%
ERROR: cudaGetLastError() returned 8: invalid device function[/CODE]

If I try to just run an exponent instead of self test is just errors:
[CODE]running a simple selftest...
ERROR: cudaGetLastError() returned 8: invalid device function[/CODE]

I had the same problem with 2013 on CUDA 6.5, but I never did much with it since 2012 works. I don't want anyone to spend time fixing it (at least for now), just didn't know if you had seen anything similar.

Jerry

TheJudger 2015-09-18 16:51

Hi Jerry,

one possibility to trigger "invalid device function" is the wrong compute capability. But I guess you already know this and did a proper built with e.g. CC 1.1, 2.0, 3.0, 3.5 and 5.0.

Oliver

flashjh 2015-09-18 16:54

Yes, used the same makefile for 2013 and 2012. CUDA 7.5 does not support 1.x anymore, not that it matters here.

TheJudger 2015-09-19 20:40

Hi Jerry,
[LIST][*]do you have "--ptxas-options=-v" set in Makefile? If so any differences in output (MSVC 2012 vs. 2013) during compile?[*]can you try to compile e.g. only for CC 2.0 and test it on your GTX 580 (or only CC 3.5 for your TITAN Z)?[/LIST]
Oliver

kladner 2016-03-15 02:27

Just now, when I submitted some TF results, all the entries had the red line seen below-
[QUOTE]processing: TF no-factor for [URL="http://www.mersenne.org/M75535921"]M75535921[/URL] (274-275)
[COLOR=Red][B]Notice: Undefined index: log_anyway in C:\inetpub\www\manual_result\manual_result.inc.php on line 120 [/B][/COLOR]
CPU credit is 50.6520 GHz-days.[/QUOTE]Does this indicate something misconfigured? Is it me or the server? There is no "\www" directory in inetpub on my machine, only "\wwwroot".

Prime95 2016-03-15 02:55

My bad, should be fixed now. The problem should not affect any of the results you submitted.

axn 2016-03-15 03:02

[QUOTE=kladner;429137]Is it me or the server? [/QUOTE]

It's the server. Think about it -- it will be a security nightmare if the server could just reach into your hard disk just like that!

kladner 2016-03-15 05:21

[QUOTE=Prime95;429142]My bad, should be fixed now. The problem should not affect any of the results you submitted.[/QUOTE]
The credit and completion came through, no problem.
[U]
axn[/U]- I figured that was the case, but it seemed tactless to state that as fact.

ixfd64 2016-04-09 02:11

I was factoring an exponent from 73 to 74 bits when the computer completely froze, requiring a hard reboot. After starting up mfaktc again, I noticed that the factoring had started over, suggesting that the save file had been corrupted. I thought about uploading a copy of it here in case someone wanted to investigate the issue, but the file had already been overwritten. Just putting this out there.

On the subject of which, maybe it would be a good idea to give mfaktc the ability to create backup save files like Prime95 does?

TheJudger 2016-04-11 12:24

Hi,

[QUOTE=ixfd64;431080]I was factoring an exponent from 73 to 74 bits when the computer completely froze, requiring a hard reboot. After starting up mfaktc again, I noticed that the factoring had started over, suggesting that the save file had been corrupted. I thought about uploading a copy of it here in case someone wanted to investigate the issue, but the file had already been overwritten. Just putting this out there.

On the subject of which, maybe it would be a good idea to give mfaktc the ability to create backup save files like Prime95 does?[/QUOTE]

Short answer: No!

Longer answer: No! The checkpoints are most likely a atomic write to the filesystem (actually fopen(), a single fprintf() (less than 512 bytes) and a fclose()). Because the fprintf() is atomic it is very unlikely that this will yield a corrupted checkpoint. It could be an empty checkpoint file but that isn't very likely, too. If such things happens I would fix the computer before doing anything useful.

Maybe there was just no checkpoint because prior the system lockup there wasn't much work done on that step?

Oliver

axn 2016-04-11 13:59

[QUOTE=TheJudger;431306]The checkpoints are most likely a atomic write to the filesystem (actually fopen(), a single fprintf() (less than 512 bytes) and a fclose()). Because the fprintf() is atomic it is very unlikely that this will yield a corrupted checkpoint. It could be an empty checkpoint file but that isn't very likely, too.[/QUOTE]

That doesn't sound atomic. If something goes wrong between fopen and fprintf, or more likely, if the OS hasn't actually propagated the write from memory to disk even after fopen->fprintf->fclose has completed, you'll end up with empty checkpoint. It is rare, but can happen, even if everything is working exactly as expected. Hence the advantage of multiple checkpoint files -- even if one fails, the other one(s) will be there, and loss of work can be minimized.

James Heinrich 2016-04-11 15:15

[QUOTE=TheJudger;431306]atomic write to the filesystem (actually fopen(), a single fprintf() (less than 512 bytes) and a fclose())[/QUOTE][QUOTE=axn;431312]That doesn't sound atomic.[/QUOTE]A relatively minor change of writing to [i]Mxxxxx.tmp[/i] and then renaming over [i]Mxxxxx.ckp[/i] after the write is complete would atomize it. You'll either have the new checkpoint, or in worst case if a crash happens during the checkpoint-write process you'll have the previous checkpoint and a temp file (which may or may not be correctly written).

TheJudger 2016-04-11 16:07

This is not rocket science, lets keep it simple. One [B]could[/B] argue that if you can't write a simple checkpoint reliable on your machine you won't trust the main calculation, too.

Oliver

ixfd64 2016-04-11 17:12

[QUOTE=TheJudger;431306]Maybe there was just no checkpoint because prior the system lockup there wasn't much work done on that step?[/QUOTE]

When I started up mfaktc before the crash, the assignment was already about 37% done, so I'm pretty sure there was a checkpoint.

On the subject of which, is there any way to tell mfaktc to start at a certain class other than by hacking the save file?

TheJudger 2016-04-11 21:04

[QUOTE=ixfd64;431328]On the subject of which, is there any way to tell mfaktc to start at a certain class other than by hacking the save file?[/QUOTE]

Expect hacking the code? No (for obvious reasons).

Oliver

TObject 2016-04-11 21:33

Windows 7 had a nice feature, called “Previous Versions” (Windows 8 and later have it replaced with something called “File History” which is not as good).

You just right click on a file and you can see or restore a previous version.

This functionality is also usually available with NAS configured to make automatic snapshots.
Or, at the very least, checkpoint files can be restored from daily backups, manually.

Last time I checked, mfaktc was open source. The encryption routine is just a few lines long. Modifying checkpoint files is useful when one wants to split a bit level among several GPUs.

TheJudger 2016-04-12 09:44

Hi,

[QUOTE=TObject;431343]Last time I checked, mfaktc was open source. The encryption routine is just a few lines long. Modifying checkpoint files is useful when one wants to split a bit level among several GPUs.[/QUOTE]

it is [B]not encryption[/B], it is just a [B]checksum[/B].
Is there any good reason why you would split a single assignment through multiple GPUs on a regular basis?
I'm afraid this discussion leads to a "howto forge false results" even if not intended by you.

Oliver

LaurV 2016-04-12 13:42

We are with Oliver here.
We used mfaktc for years and never had problems with checkpoint files. [edit: we do checkpoint every 30 minutes, or so]

Also, if really needed, for assignment that would take ages, splitting one expo over many cards is no problem, one simple pari or perl script can create the checkpoint file to start with some predetermined class. [edit: you still have to watch them to know when to stop each of them, except the last who stops by itself after the last class]

bayanne 2016-06-07 10:27

0.21 for a Mac
 
Has anyone compiled mfaktc 0.21 for a Mac?

airsquirrels 2016-06-10 16:16

I compiled mfaktc using CUDA 8.0 and computer 6.1 for the pascal 1080 Founders Edition, but it looks like the bugs in 7.0 and 7.5 are still present. 43/107 self tests failed.

TheJudger 2016-06-10 22:03

[QUOTE=airsquirrels;435955]I compiled mfaktc using CUDA 8.0 and computer 6.1 for the pascal 1080 Founders Edition, but it looks like the bugs in 7.0 and 7.5 are still present. 43/107 self tests failed.[/QUOTE]

You're talking about CUDA 8.0 RC I guess... right and wrong. At least they fixed the sub_cc bug on Maxwell which allows me to analyze the remainig bug(s) at least...

Oliver

P.S. reopened the bugreport already

TheJudger 2016-06-10 23:26

Can you run an unmodified mfaktc 0.21 selftest (mfaktc.exe -st) on your Pascal GPU?
CUDA 8.0RC doesn't look that bad for me on my GTX980 (compute 5.2):
[CODE] kernel | success | fail
-------------------+---------+-------
UNKNOWN kernel | 0 | 0
75bit_mul32 | 0 | 2682
95bit_mul32 | 0 | 2867
barrett76_mul32 | 1096 | 0
barrett77_mul32 | 1114 | 0
barrett79_mul32 | 1153 | 0
barrett87_mul32 | 1066 | 0
barrett88_mul32 | 1069 | 0
barrett92_mul32 | 1084 | 0
75bit_mul32_gs | 0 | 2420
95bit_mul32_gs | 0 | 2597
barrett76_mul32_gs | 1079 | 0
barrett77_mul32_gs | 1096 | 0
barrett79_mul32_gs | 1130 | 0
barrett87_mul32_gs | 1044 | 0
barrett88_mul32_gs | 1047 | 0
barrett92_mul32_gs | 1062 | 0
[/CODE]
Much better than 7.0 and 7.5 with the subcc bug...

Oliver

airsquirrels 2016-06-11 01:33

Here is 8.0 with compute 6.1, I will redo with the same compute as you:

UPDATE: compute less than 6.1 throws 'ERROR: cudaGetLastError() returned 8: invalid device function'

[CODE]
Selftest statistics
number of tests 26192
successfull tests 13434
no factor found 12758

kernel | success | fail
-------------------+---------+-------
UNKNOWN kernel | 0 | 0
71bit_mul24 | 2586 | 0
75bit_mul32 | 0 | 2682
95bit_mul32 | 0 | 2867
barrett76_mul32 | 1096 | 0
barrett77_mul32 | 1114 | 0
barrett79_mul32 | 1153 | 0
barrett87_mul32 | 1066 | 0
barrett88_mul32 | 1069 | 0
barrett92_mul32 | 1084 | 0
75bit_mul32_gs | 0 | 2420
95bit_mul32_gs | 0 | 2597
barrett76_mul32_gs | 1079 | 0
barrett77_mul32_gs | 1096 | 0
barrett79_mul32_gs | 0 | 1130
barrett87_mul32_gs | 1044 | 0
barrett88_mul32_gs | 1047 | 0
barrett92_mul32_gs | 0 | 1062
[/CODE]

TheJudger 2016-06-11 16:01

:sad: :sad: :sad: :sad: :sad:
Looks like they (nvidia) has a big issue with subcc... looks like the same bug is not fixed yet for Pascal...

Nvidia doesn't like me/mfaktc!

Oliver

TheJudger 2016-06-13 18:14

Hi,

thanks to David we know[LIST][*]that you can't run mfaktc one Pascal (GTX 1070/1080) [B][U]today[/U][/B] (likely a bug in CUDA 8.0RC)[*]the issue is the same as with Maxwell (but with Maxwell you can go for CUDA 6.5) (CUDA 7.0 and 7.5 have even worse bugs related to [I]subcc[/I]).[/LIST]

For now [B][U]I guess[/U][/B] that nvidia didn't fix the subcc bug completly. :sad:

[B][U]For now[/U][/B] I can't recommend to buy a Pascal GPU if the (only) purpose is running mfaktc! That is sad because the performance numbers are really sweet... over 1THz equivalent (Davids GTX 1080) with less than 200W. That is more than 5GHz equivalent per watt!

Oliver

TheJudger 2016-06-24 22:21

[B][I][U]should[/U][/I][/B] be fixed in final CUDA 8.0.

Oliver

ET_ 2016-06-27 10:31

[QUOTE=TheJudger;436892][B][I][U]should[/U][/I][/B] be fixed in final CUDA 8.0.

Oliver[/QUOTE]

:bow:

mattmill30 2016-07-27 11:27

failwell enhancement for checkpoint write error
 
It would be useful following a 'WARNING, could not write checkpoint file "M#########.ckp"' for the checkpoint to be output to stdout so that the file can be manually created if necessary. Ideally this could be enabled for every checkpoint with the introduction of an additional mfaktc.ini parameter or a new switch.

[QUOTE=LaurV;431381]We are with Oliver here.
We used mfaktc for years and never had problems with checkpoint files. [edit: we do checkpoint every 30 minutes, or so]

Also, if really needed, for assignment that would take ages, splitting one expo over many cards is no problem, one simple pari or perl script can create the checkpoint file to start with some predetermined class. [edit: you still have to watch them to know when to stop each of them, except the last who stops by itself after the last class][/QUOTE]

Do you have the script available? Or are you able to generate the checksum for class 1808 of a multi-bit range factoring of ^77 to ^81 for M332347303?

LaurV 2016-07-28 05:48

Can you post the contents of [U]any[/U] checkpoint file? (make one, copy paste the text here).
So I can adjust the checksum to match yours, there are different calculus for different mfaktc versions.

I don't have access to my computer at home right now, (something is wrong there, I am at job, lunch break), but I can put together a small C program to do that, by shamelessly copying from Oliver's code, which is public, on the web. The "checkpoint.c", in mfaktc distribution, first two functions are all you need, ctrl+c, ctrl+v in your favorite IDE, then add a "main" and here you go.

[CODE]
#include "stdafx.h"

#define CHECKPOINT_FILE "mfaktc.ckp"
#define NUM_CLASSES 4620
#define MFAKTC_VERSION "0.20"

unsigned int checkpoint_checksum(char *string, int chars)
/* generates a CRC-32 like checksum of the string */
{
unsigned int chksum = 0;
int i, j;

for (i = 0; i<chars; i++)
{
for (j = 7; j >= 0; j--)
{
if ((chksum >> 31) == (((unsigned int)(string[i] >> j)) & 1))
{
chksum <<= 1;
}
else
{
chksum = (chksum << 1) ^ 0x04C11DB7;
}
}
}
return chksum;
}

// writes the checkpoint file
void checkpoint_write(unsigned int exp, int bit_min, int bit_max, int cur_class, int num_factors)
{
FILE *f;
char buffer[100], filename[20];
unsigned int i;

sprintf_s(filename, "M%u.ckp", exp);

fopen_s(&f, filename, "w");
if (f == NULL)
{
printf("WARNING, could not write checkpoint file \"%s\"\n", CHECKPOINT_FILE);
}
else
{
sprintf_s(buffer, "%u %d %d %d %s: %d %d", exp, bit_min, bit_max, NUM_CLASSES, MFAKTC_VERSION, cur_class, num_factors);
i = checkpoint_checksum(buffer, strlen(buffer));
fprintf(f, "%u %d %d %d %s: %d %d %08X", exp, bit_min, bit_max, NUM_CLASSES, MFAKTC_VERSION, cur_class, num_factors, i);
fclose(f);
}
}

int _tmain(int argc, _TCHAR* argv[])
{
unsigned int exp;
int bmin, bmax, cls;
char ch;

printf("Exponent : "); scanf_s("%u", &exp);
printf("From bitlevel : "); scanf_s("%d", &bmin);
printf("To bitlevel : "); scanf_s("%d", &bmax);
printf("Current class : "); scanf_s("%d", &cls);
checkpoint_write(exp, bmin, bmax, cls, 0); //assume no factors were found by former runs
printf("\nDone. Use it at your own risk...\nPress a key to exit.");
ch=_getch();
return 0;
}

[/CODE]

Assuming you can't compile, and assuming my code is right, and assuming you use version 0.20 of the code, this is what is generated for your data:
[CODE]332347303 77 81 4620 0.20: 1808 0 A60FF311[/CODE]

mattmill30 2016-07-30 12:40

[QUOTE=LaurV;438881]Can you post the contents of [U]any[/U] checkpoint file? (make one, copy paste the text here).
So I can adjust the checksum to match yours, there are different calculus for different mfaktc versions. [/QUOTE]

Thanks LaurV, I didn't consider the checksum might differ between versions, since they don't with prime95.

Here is an earlier checkpoint:
[CODE]M332347303 77 81 4620 0.21: 1805 0 57B2DB5F[/CODE]


[QUOTE=LaurV;438881]Assuming you can't compile, and assuming my code is right, and assuming you use version 0.20 of the code, this is what is generated for your data:
[CODE]332347303 77 81 4620 0.20: 1808 0 A60FF311[/CODE][/QUOTE]

I did test the checkpoint you generated, but as you noted since I'm using a different version (0.21) mfakto didn't recognise it.

mattmill30 2016-07-30 13:55

FYI, I have completed the lost TF work and the checkpoint reads:
[CODE]M332347303 77 81 4620 0.21: 1808 0 5FCDA1FC[/CODE]

LaurV 2016-08-02 03:44

Ok, sorry I didn't have time to revisit this topic, in fact I didn't consider it priority anymore, because I saw you redone the work anyhow. Just to do a knot on the lose ends, here is the code that does the checksum for version 0.21, also copied from Oliver's code which is available on web, I only replaced scanf/open/etc with their "safe" versions to avoid vc++ making a big scandal of it...

Version 0.21 added that "M" in front, to distinguish from "W" when mfaktc is used for Wagstaff numbers. Therefore the difference in the file. This code [U]does[/U] generate checksums as you expect (and matching what you posted here, I tested it). To generate checksums for wagstaff numbers, you have to modify the define (or define WAGSTAFF).

[CODE]
#include "stdafx.h"

#define NUM_CLASSES 4620
#define MFAKTC_VERSION "0.21"

#ifdef WAGSTAFF
#define NAME_NUMBERS "W"
#else /* Mersennes */
#define NAME_NUMBERS "M"
#endif

unsigned int checkpoint_checksum(char *string, int chars)
/* generates a CRC-32 like checksum of the string */
{
unsigned int chksum = 0;
int i, j;

for (i = 0; i<chars; i++)
{
for (j = 7; j >= 0; j--)
{
if ((chksum >> 31) == (((unsigned int)(string[i] >> j)) & 1))
{
chksum <<= 1;
}
else
{
chksum = (chksum << 1) ^ 0x04C11DB7;
}
}
}
return chksum;
}

// writes the checkpoint file
void checkpoint_write(unsigned int exp, int bit_min, int bit_max, int cur_class, int num_factors)
{
FILE *f;
char buffer[100], filename[20];
unsigned int i;

sprintf_s(filename, "%s%u.ckp", NAME_NUMBERS, exp);

fopen_s(&f, filename, "w");
if (f == NULL)
{
printf("WARNING, could not write checkpoint file \"%s\"\n", filename);
}
else
{
sprintf_s(buffer, "%s%u %d %d %d %s: %d %d", NAME_NUMBERS, exp, bit_min, bit_max, NUM_CLASSES, MFAKTC_VERSION, cur_class, num_factors);
i = checkpoint_checksum(buffer, strlen(buffer));
fprintf(f, "%s%u %d %d %d %s: %d %d %08X", NAME_NUMBERS, exp, bit_min, bit_max, NUM_CLASSES, MFAKTC_VERSION, cur_class, num_factors, i);
fclose(f);
}
}

//=======================================================
int _tmain(int argc, _TCHAR* argv[])
{
unsigned int exp;
int bmin, bmax, cls;
char ch;

printf("Exponent : "); scanf_s("%u", &exp);
printf("From bitlevel : "); scanf_s("%d", &bmin);
printf("To bitlevel : "); scanf_s("%d", &bmax);
printf("Current class : "); scanf_s("%d", &cls);
checkpoint_write(exp, bmin, bmax, cls, 0); //assume no factors were found by former runs
printf("\nDone. Use it at your own risk...\nPress a key to exit.");
ch=_getch();
return 0;
}
[/CODE]

[CODE]M332347303 77 81 4620 0.21: 1808 0 5FCDA1FC[/CODE]

mattmill30 2016-08-13 22:11

Feature request: -tf extention, resume bit-range from particular class
 
Feature request:
Expansion of -tf switch to include support for beginning from a particular class.

This feature has at least two real world applications:[LIST=1][*]Resuming from the last checked class following a checksum write error [*]Resuming from a particular class following the successful discovery of a factor, in order to complete the bit-range[/LIST]
Additionally, if it is trivial to implement, then the ability to resume from the bit-range and class in which a factor exists. I'm not sure how this would work along-side compound factors. An example for this usage would be when attempting to complete any remaining factorisation of an exponent such as [URL="http://www.mersenne.org/report_exponent/?exp_lo=9100919&full=1"]M9100919[/URL], where no bit-ranges have been included with factor submissions.

TheJudger 2016-08-13 22:23

Hi,

[QUOTE=mattmill30;439953]
Resuming from a particular class following the successful discovery of a factor, in order to complete the bit-range[/QUOTE]

Short: not possible!
Long: not possible, because we don't know which application reported the factor, which settings where used, etc. Prime95 splits the search space in residue classes mod 96(?) over the factor candidates (FCs) while mfaktc can do residue classes mod 420 or 4620 over the k in FC = 2kp+1.

Oliver

Prime95 2016-08-13 22:43

[QUOTE=TheJudger;439954]Prime95 splits the search space in residue classes mod 96(?)[/QUOTE]

mod 120

TheJudger 2016-08-14 12:10

Thank you for correction. It was too late yesterday. I know the numbers for mfaktc and I know Prime95 uses somewhat less residues classes but had the wrong number in my mind.

ji2my 2016-09-09 06:57

ERROR: cudaGetLastError() returned 8: invalid device function
 
Hi,

I've encounter a error, can anyone help me to solve it?

Thanks!


D:\mfaktc>mfaktc-win-64.exe
mfaktc v0.21 (64bit built)

Compiletime options
THREADS_PER_BLOCK 256
SIEVE_SIZE_LIMIT 32kiB
SIEVE_SIZE 193154bits
SIEVE_SPLIT 250
MORE_CLASSES enabled

Runtime options
SievePrimes 25000
SievePrimesAdjust 1
SievePrimesMin 5000
SievePrimesMax 100000
NumStreams 3
CPUStreams 3
GridSize 3
GPU Sieving enabled
GPUSievePrimes 82486
GPUSieveSize 64Mi bits
GPUSieveProcessSize 16Ki bits
Checkpoints enabled
CheckpointDelay 30s
WorkFileAddDelay 600s
Stages enabled
StopAfterFactor bitlevel
PrintMode full
V5UserID (none)
ComputerID (none)
AllowSleep no
TimeStampInResults no

CUDA version info
binary compiled for CUDA 6.50
CUDA runtime version 6.50
CUDA driver version 8.0

CUDA device info
name GeForce GTX 1070
compute capability 6.1
max threads per block 1024
max shared memory per MP 98304 byte
number of multiprocessors 15
clock rate (CUDA cores) 1708MHz
memory clock rate: 4004MHz
memory bus width: 256 bit

Automatic parameters
threads per grid 983040
GPUSievePrimes (adjusted) 82486
GPUsieve minimum exponent 1055144

running a simple selftest...
ERROR: cudaGetLastError() returned 8: invalid device function

D:\mfaktc>

KaptainBlaZzed 2016-09-09 18:21

i tried this on my 1080 and i get the error
"ERROR: Cudagetlasterror() returned: 8 invalid device function"

Can you upgrade the program to function with pascal and the new CUDA architecture?

airsquirrels 2016-09-09 19:27

You must compile for the specific compute version and CUDA version of the card you are using. In this case the 8.0 RC and compute 6.1. Each generation of GPUs requires a separate build

henryzz 2016-09-09 19:43

[QUOTE=airsquirrels;442058]You must compile for the specific compute version and CUDA version of the card you are using. In this case the 8.0 RC and compute 6.1. Each generation of GPUs requires a separate build[/QUOTE]

???
I have used old binaries with my 750Ti

airsquirrels 2016-09-09 20:59

[QUOTE=henryzz;442060]???
I have used old binaries with my 750Ti[/QUOTE]

Some of the older cards/CUDA versions supported multiple compute versions and architectures, but Maxwell and Pascal both seem to required specific builds.

KaptainBlaZzed 2016-09-09 22:46

[QUOTE=airsquirrels;442058]You must compile for the specific compute version and CUDA version of the card you are using. In this case the 8.0 RC and compute 6.1. Each generation of GPUs requires a separate build[/QUOTE]


can you please tell me how to do this for windows x64.


All times are UTC. The time now is 22:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.