mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

TheMawn 2015-03-03 01:34

When MISFIT detects a stalled instance, would it be appropriate for it to try to determine if the worktodo.txt is empty, and if it is, automatically transfer work (possibly from worktodo.add) and automatically attempt to restart the instance?

This is assuming the control codes are correctly jiggered.

EDIT: I make this suggestion assuming the lack of any better way for MISFIT to determine if that is the reason mfaktx died.

EDIT: Again I don't know if this is the kind of functionality we want on the MISFIT level. Perhaps this is something better off being fixed at the mfaktx level (i.e. first thing when the program is run, merge worktodo.add with worktodo.txt)

swl551 2015-03-03 02:17

[QUOTE=TheMawn;396849]When MISFIT detects a stalled instance, would it be appropriate for it to try to determine if the worktodo.txt is empty, and if it is, automatically transfer work (possibly from worktodo.add) and automatically attempt to restart the instance?

This is assuming the control codes are correctly jiggered.

EDIT: I make this suggestion assuming the lack of any better way for MISFIT to determine if that is the reason mfaktx died.

EDIT: Again I don't know if this is the kind of functionality we want on the MISFIT level. Perhaps this is something better off being fixed at the mfaktx level (i.e. first thing when the program is run, merge worktodo.add with worktodo.txt)[/QUOTE]


A primary goal of MISFIT is to never allow your installations to run out of work. It is possible to configure MISFIT and let everything run for months without human intervention (wait... install those windows patches every month!) If work is running out you have not configured MISFIT "correctly" -- use the work calculator.


As for restarting stalled instances.... The ONLY time I have had instances stall is due to overclocking and upon restart I found that the clock speed is always a paltry 420mhz. If MISFIT restarted the instance instead of alarming you could be running crippled and not know it. Also if you have lots of stalls you have something misconfigured with your card or a defective card. It is possible for MISFIT to do more than it does, but coding is a lot of work......

TheMawn 2015-03-03 03:26

Fair enough. To be clear, I am having no issues at all. Haven't had to muck around with the GPUs in weeks. I just saw you mention you were working with MISFIT when you encountered the issue where the empty worktodo.txt would prevent mfaktx from ever running and I was just bouncing ideas around in case you were actively trying to add some functionality to MISFIT to deal with that situation.

It's perfectly fine that you don't. In fact in my (very limited) coding experience, I find the "niche" cases to not be worth dealing with.

vsuite 2015-04-03 17:32

What is the advantage of mfaktc .21 over .20 please?

What is the advantage of 6.5 CUDA over 4.2?

Using a GTX 460 running 7.0 (compute capability 2.1) and a 640 running 6.0 (compute capability 3.0).

Thanks.

James Heinrich 2015-04-03 18:05

[QUOTE=vsuite;399291]What is the advantage of mfaktc .21 over .20 please?[/QUOTE][code]version 0.21 (2015-02-17)
- added support for Wagstaff numbers: (2^p + 1)/3
- added support for "worktodo.add"
- enabled GPU sieving on CC 1.x GPUs
- dropped lower limit for exponents from 1,000,000 to 100,000
- rework selftest (-st and -st2), both now test ALL testcases, -st narrowed the searchspace (k_min < k_factor < k_max) to speedup the selftest.
- added random offset for selftest, this might detect bugs in sieve code which a static offset wouldn't find because we always test the same value.
- fixed a bug where mfaktc runs out of shared memory (GPU sieve), might be the cause for some reported (but never reproduced?) crashes. This occurs when you
- have a GPU with relative small amount of shared memory
- have a LOW value for GPUSievePrimes
- have a BIG value for GPUSieveSize
- fixed a bug when GPUSieveProcessSize is set to 24 AND GPUSieveSize is not a multiple of 3 there was a relative small chance to ignore a factor.
- fixed a bug in SievePrimesAdjust causing SievePrimes where lowered to SievePrimesMin for very short running jobs
- added missing dependencies to Windows Makefiles
- (possible) speedups
- funnel shift for CC 3.5 and above
- slightly faster integer division for barrett_76,77,79 kernels
- lots of cleanups and removal of duplicate code
- print per-kernel-stats for selftest "-st" and "-st2"[/code]

Mark Rose 2015-04-03 18:48

[QUOTE=James Heinrich;399293][code] - slightly faster integer division for barrett_76,77,79 kernels[/code][/QUOTE]

That works out to about 1.5% ± 0.5% from what I've seen.

James Heinrich 2015-04-03 20:28

[QUOTE=James Heinrich;399293]- enabled GPU sieving on CC 1.x GPUs
- dropped lower limit for exponents from 1,000,000 to 100,000[/QUOTE]The major feature addition for me was GPU sieving below 2[sup]64[/sup], but that didn't make it to the changelog for some reason.

vsuite 2015-04-03 20:55

Thanks much.

My XP Core 2 Quad reports the .21 win32 app as not a valid Win32 application. [Also the 5.5, 6.0 and 6.5 CudaLucas, but not the 5.0 or 4.2 CudaLucas.] It should not be filesize.

James Heinrich 2015-04-03 21:04

Download again?
[url]http://download.mersenne.ca/mfaktc/mfaktc-0.21/mfaktc-0.21.win.cuda65.zip[/url]

vsuite 2015-04-03 21:14

Thanks again.

preda 2015-04-09 15:08

mkfaktc 0.21 selftest fails, cuda 7.0 on Linux
 
After compiling 0.21 from source with Cuda toolkit 7.0, I consistently get this self-test failure (always the same number of tests passed/failed at the end). Any hints are appreciated, thanks.

[CODE]mfaktc v0.21 (64bit built)

Compiletime options
THREADS_PER_BLOCK 256
SIEVE_SIZE_LIMIT 32kiB
SIEVE_SIZE 193154bits
SIEVE_SPLIT 250
MORE_CLASSES enabled

Runtime options
SievePrimes 25000
SievePrimesAdjust 1
SievePrimesMin 5000
SievePrimesMax 100000
NumStreams 3
CPUStreams 3
GridSize 3
GPU Sieving enabled
GPUSievePrimes 82486
GPUSieveSize 64Mi bits
GPUSieveProcessSize 16Ki bits
Checkpoints enabled
CheckpointDelay 30s
WorkFileAddDelay 600s
Stages enabled
StopAfterFactor bitlevel
PrintMode full
V5UserID (none)
ComputerID (none)
AllowSleep no
TimeStampInResults no

CUDA version info
binary compiled for CUDA 7.0
CUDA runtime version 7.0
CUDA driver version 7.0

CUDA device info
name GeForce GTX 980
compute capability 5.2
max threads per block 1024
max shared memory per MP 98304 byte
number of multiprocessors 16
CUDA cores per MP 128
CUDA cores - total 2048
clock rate (CUDA cores) 1215MHz
memory clock rate: 3505MHz
memory bus width: 256 bit

Automatic parameters
threads per grid 1048576
GPUSievePrimes (adjusted) 82486
GPUsieve minimum exponent 1055144

running a simple selftest...
ERROR: selftest failed for M49635893
no factor found
ERROR: selftest failed for M49635893
no factor found
ERROR: selftest failed for M49635893
no factor found
ERROR: selftest failed for M49635893
no factor found
ERROR: selftest failed for M49635893
no factor found
ERROR: selftest failed for M49635893
no factor found
ERROR: selftest failed for M49635893
no factor found
ERROR: selftest failed for M49635893
no factor found
ERROR: selftest failed for M51375383
no factor found
ERROR: selftest failed for M51375383
no factor found
ERROR: selftest failed for M51375383
no factor found
ERROR: selftest failed for M51375383
no factor found
ERROR: selftest failed for M51375383
no factor found
ERROR: selftest failed for M51375383
no factor found
ERROR: selftest failed for M51375383
no factor found
ERROR: selftest failed for M51375383
no factor found
ERROR: selftest failed for M47644171
no factor found
ERROR: selftest failed for M47644171
no factor found
ERROR: selftest failed for M47644171
no factor found
ERROR: selftest failed for M47644171
no factor found
ERROR: selftest failed for M47644171
no factor found
ERROR: selftest failed for M47644171
no factor found
ERROR: selftest failed for M47644171
no factor found
ERROR: selftest failed for M47644171
no factor found
ERROR: selftest failed for M51038681
no factor found
ERROR: selftest failed for M51038681
no factor found
ERROR: selftest failed for M51038681
no factor found
ERROR: selftest failed for M51038681
no factor found
ERROR: selftest failed for M51038681
no factor found
ERROR: selftest failed for M51038681
no factor found
ERROR: selftest failed for M51038681
no factor found
ERROR: selftest failed for M51038681
no factor found
ERROR: selftest failed for M53076719
no factor found
ERROR: selftest failed for M53076719
no factor found
ERROR: selftest failed for M53076719
no factor found
ERROR: selftest failed for M53076719
no factor found
ERROR: selftest failed for M53076719
no factor found
ERROR: selftest failed for M53076719
no factor found
ERROR: selftest failed for M53076719
no factor found
ERROR: selftest failed for M53076719
no factor found
ERROR: selftest failed for M53123843
no factor found
ERROR: selftest failed for M53123843
no factor found
ERROR: selftest failed for M53123843
no factor found
ERROR: selftest failed for M53123843
no factor found
ERROR: selftest failed for M53123843
no factor found
ERROR: selftest failed for M53123843
no factor found
ERROR: selftest failed for M53123843
no factor found
ERROR: selftest failed for M53123843
no factor found
ERROR: selftest failed for M3321928703
no factor found
ERROR: selftest failed for M3321928703
no factor found
ERROR: selftest failed for M3321928703
no factor found
ERROR: selftest failed for M3321928703
no factor found
ERROR: selftest failed for M3321931973
no factor found
ERROR: selftest failed for M3321931973
no factor found
ERROR: selftest failed for M3321928619
no factor found
ERROR: selftest failed for M3321928619
no factor found
Selftest statistics
number of tests 107
successfull tests 51
no factor found 56

selftest FAILED!
random selftest offset was: 12734519[/CODE]


All times are UTC. The time now is 23:12.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.