mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

Chuck 2011-12-22 21:57

Checkpoint overhead?
 
Can someone estimate what the overhead of checkpoints is? I decided several weeks ago to turn them off, as mfaktc and my computer are very stable. On rare occasions I need to reboot the computer, and I might lose an hour of processing time if I am too impatient to wait for the current bitlevels to finish.

I am wondering if a month's overhead of checkpoints is more than an hour of lost work time.

Bdot 2011-12-23 00:10

[QUOTE=Chuck;283235]Can someone estimate what the overhead of checkpoints is? I decided several weeks ago to turn them off, as mfaktc and my computer are very stable. On rare occasions I need to reboot the computer, and I might lose an hour of processing time if I am too impatient to wait for the current bitlevels to finish.

I am wondering if a month's overhead of checkpoints is more than an hour of lost work time.[/QUOTE]

I just timed CPs on a W7-64 Core i7-M620 laptop with a slow disk.

per CP:
0.01 ms for creating the checksum (CPU load)
0.2 ms writing & closing the file
1 ms for remove/rename operations for the backup file (mfakto only - mfaktc just has a remove ~ 0.2 ms)
1 ms for committing to disk (fflush);


CPs are written after a class is finished, and before more work is loaded on the GPU - so this is "idle time" for the GPU if you just run a single instance. When running more instances per GPU, then they will overlap.

So if you calculate single instance, 2 ms per CP, one CP after each class, 2 seconds per class, then you spend 0.1% of the time for writing the CP (this should be pretty much worst case). 0.1% of one month is ~ 45 min. If you lose 1h / month due to not writing CP's, you'd already be better off enabling them.
And now you can configure mfaktc to write CP's less frequently - in your case you can set it to maximum (900 s) and it will still write a CP when you abort it with ^C. Then you spend about 6 seconds per month for writing the CPs.

Still anyone running without checkpoints? :smile:

nucleon 2011-12-23 02:09

hehe ramdisk - and all those problems dissappear.

-- Craig

Chuck 2011-12-23 03:14

Thanks bdot that was very helpful. I hadn't looked at checkpoints for some time since before GPUTO72 I was "lumberjacking" in the M600,000,000 range where a TF run took around a minute (I was using chalsall's MORE_CLASSES disabled version).

I went with 600 as the checkpoint delay. It's nice that one is taken after a CTRL-C.

chalsall 2011-12-23 03:46

[QUOTE=Chuck;283266](I was using chalsall's MORE_CLASSES disabled version)[/QUOTE]

That wasn't me, Guv.

kladner 2011-12-23 04:12

[QUOTE=chalsall;283272]That wasn't me, Guv.[/QUOTE]

That would have been "mfaktc171apsen.cuda40.sm_multi.LESS_CLASSES", maybe?

Chuck 2011-12-23 13:19

Oh that's right chalsall is the GPUTO72 author — anyway there was a post somewhere with the MORE_CLASSES disabled or LESS_CLASSES enabled and I picked up the executable and used it for a couple of months.

TheJudger 2011-12-23 15:39

[QUOTE=Chuck;283310]Oh that's right chalsall is the GPUTO72 author — anyway there was a post somewhere with the MORE_CLASSES disabled or LESS_CLASSES enabled and I picked up the executable and used it for a couple of months.[/QUOTE]

I've posted an executable without MORE_CLASSES [URL="http://www.mersenneforum.org/showpost.php?p=273900&postcount=363"]here[/URL] (mfaktc 0.17).

Oliver

Radikalinsky 2011-12-25 03:29

I just found a factor with 0.18:

[QUOTE]M52248761 has a factor: 3708847255636615579439 [TF:70:72*:mfaktc 0.18 barrett79_mul32]
found 1 factor for M52248761 from 2^70 to 2^72 (partially tested) [mfaktc 0.18 barrett79_mul32]
[/QUOTE]Obviously the prime server does not yet like the nice new accurate messages from version 0.18.

[QUOTE]
No factor lines found: 0
Mfaktc no factor lines found: 0
Mfakto no factor lines found: 0
Factors found: 1
Processing result: M52248761 has a factor: 3708847255636615579439
Insufficient information for accurate CPU credit. For stats purposes, assuming factor was found using P-1 with B1 = 800000.
CPU credit is 2.4586 GHz-days.
P-1 lines found: 0
LL lines found: 0
Mlucas lines found: 0
Glucas (G29) lines found: 0
Glucas lines found: 0
MacLucasFFTW lines found: 0
CUDALucas lines found: 0
ECM lines found: 0
[/QUOTE]

Edit: Ok, I just saw that this is on James Heinrich's todo list. Sorry

kladner 2011-12-25 03:35

[QUOTE=Radikalinsky;283459]I just found a factor with 0.18:


Obviously the prime server does not yet like the nice new accurate messages from version 0.18.[/QUOTE]

I saw this once. I think it occurred when I uploaded the result before the second, "end of level" line was generated. As in:

[CODE]M52279247 has a factor: 1525757169405396899617 [TF:70:71:mfaktc 0.18 barrett79_mul32]
found 1 factor for M52279247 from 2^70 to 2^71 [mfaktc 0.18 barrett79_mul32]
[/CODE]

Radikalinsky 2011-12-25 04:04

@Kladner,
I manually submitted both lines. Maybe it is because with partial tests the primenet server does some assumptions. But as I understand, the primenet server just does not yet understand all the details of the mfaktc message, both 0.17 and 0.18.

Thanks, Rad


All times are UTC. The time now is 23:16.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.