![]() |
|
|
#1 |
|
(loop (#_fork))
Feb 2006
Cambridge, England
72·131 Posts |
I ran -nc2 on a 64G machine, and want to transfer the matrix and checkpoint to a faster machine with 32G memory to do the actual linear algebra.
But, on two separate machines and two attempts per machine, including restarting at an earlier checkpoint, I get something like Code:
Fri Mar 31 20:51:19 2017 commencing Lanczos iteration (6 threads) Fri Mar 31 20:51:19 2017 memory use: 16253.3 MB Fri Mar 31 20:51:20 2017 restarting at iteration 626 (dim = 39615) Fri Mar 31 20:53:58 2017 linear algebra at 0.1%, ETA 1106h20m Fri Mar 31 20:55:29 2017 error: corrupt state, please restart from checkpoint |
|
|
|
|
|
#2 |
|
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
23·3·5·72 Posts |
Is it trying to use a different block size on the 32gb machines?
|
|
|
|
|
|
#3 | |
|
(loop (#_fork))
Feb 2006
Cambridge, England
72×131 Posts |
Quote:
Code:
tractor (64G) Sun Apr 2 22:07:29 2017 using block size 8192 and superblock size 983040 for processor cache size 10240 kB pumpkin (32G, i7-4930K) Fri Mar 31 20:46:40 2017 sparse part has weight 4557683460 (118.90/col) Fri Mar 31 20:46:40 2017 using block size 8192 and superblock size 1179648 for processor cache size 12288 kB butternut (32G, i7-5820K) Fri Mar 31 20:08:38 2017 using block size 8192 and superblock size 1179648 for processor cache size 12288 kB Fri Mar 31 20:13:16 2017 commencing Lanczos iteration (6 threads) Last fiddled with by fivemack on 2017-04-03 at 22:33 |
|
|
|
|
|
|
#4 |
|
"Mike"
Aug 2002
25·257 Posts |
We are experiencing the same error when starting a job.
We have tried a binary we compiled ourself and one that is from someone else that is known to work. We have also tried several different target densities. FWIW, we ran msieve.dat through "remdups" prior to starting the job. ![]() Code:
Wed Jan 3 13:14:14 2018 commencing linear algebra Wed Jan 3 13:14:40 2018 read 23031042 cycles Wed Jan 3 13:15:14 2018 cycles contain 64089728 unique relations Wed Jan 3 13:49:41 2018 read 64089728 relations Wed Jan 3 13:52:03 2018 using 20 quadratic characters above 4294917296 Wed Jan 3 13:57:39 2018 building initial matrix Wed Jan 3 14:09:31 2018 memory use: 8680.3 MB Wed Jan 3 14:09:40 2018 read 23031042 cycles Wed Jan 3 14:09:43 2018 matrix is 23026809 x 23031042 (7496.0 MB) with weight 2179834698 (94.65/col) Wed Jan 3 14:09:43 2018 sparse part has weight 1711677031 (74.32/col) Wed Jan 3 14:16:50 2018 filtering completed in 3 passes Wed Jan 3 14:16:54 2018 matrix is 22923906 x 22924106 (7476.6 MB) with weight 2173903730 (94.83/col) Wed Jan 3 14:16:54 2018 sparse part has weight 1707775293 (74.50/col) Wed Jan 3 14:17:48 2018 matrix starts at (0, 0) Wed Jan 3 14:17:51 2018 matrix is 22923906 x 22924106 (7476.6 MB) with weight 2173903730 (94.83/col) Wed Jan 3 14:17:51 2018 sparse part has weight 1707775293 (74.50/col) Wed Jan 3 14:17:51 2018 saving the first 48 matrix rows for later Wed Jan 3 14:17:55 2018 matrix includes 64 packed rows Wed Jan 3 14:17:57 2018 matrix is 22923858 x 22924106 (7142.7 MB) with weight 1764421261 (76.97/col) Wed Jan 3 14:17:57 2018 sparse part has weight 1643180859 (71.68/col) Wed Jan 3 14:17:58 2018 using block size 8192 and superblock size 294912 for processor cache size 3072 kB Wed Jan 3 14:19:57 2018 commencing Lanczos iteration (2 threads) Wed Jan 3 14:19:57 2018 memory use: 6034.5 MB Wed Jan 3 14:23:49 2018 linear algebra at 0.0%, ETA 936h37m Wed Jan 3 14:25:03 2018 checkpointing every 30000 dimensions Wed Jan 3 16:46:03 2018 error: corrupt state, please restart from checkpoint |
|
|
|
|
|
#5 |
|
Sep 2008
Kansas
337610 Posts |
If you are already into Block Lanczos, one option to try (backup the folder first) is:
Code:
./msieve -v -t 2 -ncr skip_matbuild=1 -nc3 |
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| assignment restarting prob | isaac1204 | Information & Answers | 2 | 2017-07-20 17:26 |
| restarting nfs linear algebra | cubaq | YAFU | 2 | 2017-04-02 11:35 |
| Restarting a process after it is hung? | Xyzzy | Linux | 29 | 2014-04-19 14:33 |
| Restarting linear algebra | wombatman | Msieve | 2 | 2013-10-09 15:54 |
| Stop p95 or llr before restarting? | Joshua2 | Software | 6 | 2005-05-16 16:36 |