mersenneforum.org Help resuming linear algebra step
 Register FAQ Search Today's Posts Mark Forums Read

 2019-12-07, 18:48 #1 lavalamp     Oct 2007 Manchester, UK 22×5×67 Posts Help resuming linear algebra step So, my machine had been running linear algebra for six and a half hours, with only 5 minutes to go I think, "It's close enough, I'll play some counter-strike while it finishes." I load the game and ... of course I get a blue screen. No worries, there are checkpoints right? So I run msieve -s test.dat -l test.log -i test.ini -nf test.fb -t 4 -nc2 again and: Code: commencing linear algebra read 3880716 cycles cycles contain 12919660 unique relations read 0 relations error: cannot locate relation 44290431 Oh. And it wipes the test.dat.mat file for good measure. Luckily I'd made a backup of everything first. Then I see there are TWO checkpoint files, test.dat.chk and test.dat.bak.chk. So I try using the second checkpoint file and get the same error. Is there anything I can do to rescue the process at this point without running the entire linear algebra step again? Last fiddled with by lavalamp on 2019-12-07 at 18:54
2019-12-07, 19:09   #2
EdH

"Ed Hall"
Dec 2009

3,617 Posts

Quote:
 Originally Posted by lavalamp So, my machine had been running linear algebra for six and a half hours, with only 5 minutes to go I think, "It's close enough, I'll play some counter-strike while it finishes." I load the game and ... of course I get a blue screen. No worries, there are checkpoints right? So I run msieve -s test.dat -l test.log -i test.ini -nf test.fb -t 4 -nc2 again and: Code: commencing linear algebra read 3880716 cycles cycles contain 12919660 unique relations read 0 relations error: cannot locate relation 44290431 Oh. And it wipes the test.dat.mat file for good measure. Luckily I'd made a backup of everything first. Then I see there are TWO checkpoint files, test.dat.chk and test.dat.bak.chk. So I try using the second checkpoint file and get the same error. Is there anything I can do to rescue the process at this point without running the entire linear algebra step again?
If you have a backup of everything, try running -ncr instead of -nc2. That is the resume command for an interrupted process. -nc2 tells it to start nc2 over from the beginning.

2019-12-07, 19:13   #3
lavalamp

Oct 2007
Manchester, UK

22×5×67 Posts

Quote:
 Originally Posted by EdH If you have a backup of everything, try running -ncr instead of -nc2. That is the resume command for an interrupted process. -nc2 tells it to start nc2 over from the beginning.
Ah yes, that did the trick thank-you.

Seems a bit dangerous that nc2 deletes past progress then fails with an error instead of checking first to be honest. I'm very glad I made a manual backup first.

2019-12-07, 19:30   #4
EdH

"Ed Hall"
Dec 2009

1110001000012 Posts

Quote:
 Originally Posted by lavalamp Ah yes, that did the trick thank-you. Seems a bit dangerous that nc2 deletes past progress then fails with an error instead of checking first to be honest. I'm very glad I made a manual backup first.
Glad to hear it worked. Yeah, other programs like ggnfs and YAFU refuse to run if they find partial results. I have suffered the loss you (temporarily) experienced. I try to remember the "r" these days.

 2019-12-09, 04:23 #5 jasonp Tribal Bullet     Oct 2004 2×3×19×31 Posts The code doesn't know the difference between running in a directory with checkpoint files you don't want versus continuing a job with checkpoint files you do want. It would be a nice feature to encode a fingerprint of the factorization into the temporary files generated.

 Similar Threads Thread Thread Starter Forum Replies Last Post cubaq YAFU 2 2017-04-02 11:35 wombatman Msieve 2 2013-10-09 15:54 CRGreathouse Msieve 8 2009-08-05 07:25 10metreh Msieve 3 2009-02-02 08:34 R1zZ1 Factoring 2 2007-02-02 06:45

All times are UTC. The time now is 16:44.

Fri Feb 26 16:44:55 UTC 2021 up 85 days, 12:56, 0 users, load averages: 2.27, 1.80, 1.70