mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Msieve (https://www.mersenneforum.org/forumdisplay.php?f=83)
-   -   Not all columns used (https://www.mersenneforum.org/showthread.php?t=12864)

frmky 2009-12-15 09:48

Not all columns used
 
[CODE]Sat Dec 12 18:52:20 2009 commencing linear algebra
Sat Dec 12 18:52:24 2009 read 9998652 cycles
Sat Dec 12 18:53:00 2009 matrix is 9998458 x 9998652 (2905.1 MB) with weight 865972158 (86.61/col)
Sat Dec 12 18:53:00 2009 sparse part has weight 651574365 (65.17/col)
Sat Dec 12 18:53:00 2009 saving the first 48 matrix rows for later
Sat Dec 12 18:53:08 2009 matrix is 9998410 x 9998652 (2775.9 MB) with weight 674657217 (67.47/col)
Sat Dec 12 18:53:08 2009 sparse part has weight 627702934 (62.78/col)
Sat Dec 12 18:53:08 2009 matrix includes 64 packed rows
Sat Dec 12 18:53:08 2009 using block size 65536 for processor cache size 3072 kB
Sat Dec 12 18:53:55 2009 commencing Lanczos iteration (4 threads)
Sat Dec 12 18:53:55 2009 memory use: 3054.1 MB
Sat Dec 12 18:54:01 2009 restarting at iteration 129392 (dim = 8182080)
Sat Dec 12 18:55:28 2009 linear algebra at 81.8%, ETA 57h44m
Mon Dec 14 22:21:54 2009 lanczos error (dim = 9998318): not all columns used
Mon Dec 14 22:21:54 2009 lanczos halted after 158116 iterations (dim = 9998318)
Mon Dec 14 22:21:54 2009 linear algebra failed; retrying...
Mon Dec 14 22:21:54 2009 commencing Lanczos iteration (4 threads)
Mon Dec 14 22:21:54 2009 memory use: 3054.1 MB
Mon Dec 14 22:21:54 2009 restarting at iteration 157351 (dim = 9950034)
Mon Dec 14 22:23:09 2009 linear algebra at 99.5%, ETA 1h18m
Mon Dec 14 23:41:38 2009 lanczos error (dim = 9998318): not all columns used
Mon Dec 14 23:41:38 2009 lanczos halted after 158116 iterations (dim = 9998318)
Mon Dec 14 23:41:38 2009 linear algebra failed; retrying...
Mon Dec 14 23:41:38 2009 commencing Lanczos iteration (4 threads)
Mon Dec 14 23:41:38 2009 memory use: 3054.1 MB
Mon Dec 14 23:41:38 2009 restarting at iteration 157351 (dim = 9950034)
Mon Dec 14 23:42:53 2009 linear algebra at 99.5%, ETA 1h18m
Tue Dec 15 01:01:33 2009 lanczos error (dim = 9998318): not all columns used
Tue Dec 15 01:01:33 2009 lanczos halted after 158116 iterations (dim = 9998318)
Tue Dec 15 01:01:33 2009 linear algebra failed; retrying...
Tue Dec 15 01:01:33 2009 commencing Lanczos iteration (4 threads)
Tue Dec 15 01:01:33 2009 memory use: 3054.1 MB
Tue Dec 15 01:01:34 2009 restarting at iteration 157351 (dim = 9950034)
Tue Dec 15 01:02:49 2009 linear algebra at 99.5%, ETA 1h18m[/CODE]

It keeps restarting at the same iteration, 157351, and keeps failing. Is there a chance it will succeed, or should I just start over on this one?

Edit: Or perhaps I'll try disabling that error and seeing if the dependencies are good.
Edit 2: Nope, didn't help:
[CODE]Tue Dec 15 02:32:35 2009 lanczos halted after 158117 iterations (dim = 9998319)
Tue Dec 15 02:33:21 2009 lanczos error: only trivial dependencies found
Tue Dec 15 02:33:22 2009 BLanczosTime: 754
Tue Dec 15 02:33:22 2009 elapsed time 00:12:36[/CODE]

frmky 2009-12-16 04:53

1 Attachment(s)
Attached is the log file, in case any clues lie in the filtering. Using Serge's matrix dump program, I verified that the matrix has no empty or duplicate columns.

Batalov 2009-12-16 06:17

I know that I know nothing (and most probably even less than Socrates and most other Greeks), but this --
[FONT=Arial Narrow][COLOR=blue]Tue Dec 1 23:21:21 2009 found 23495926 cycles, need 10213258[/COLOR][/FONT]
may be the sign of under-removal and then the algorithm may have gotten into a unique territory where debugging will offer low return (because it will probably never happen again?). It then produced a matrix where the old code usually would have said "matrix can improve, retrying" --
[FONT=Arial Narrow][COLOR=blue]heaviest cycle: 12 relations[/COLOR][/FONT]
Right? The full merge overdid its function and made it sparse-ski.

Do you think what I think?
[spoiler]removing 20M relns from the top and /sigh/ restarting from -nc1...[/spoiler]

this is only on instincts - I am a savage savant in this business.


P.S. Actually, in your 2,908+ and in 5,383+ you had heaviest cycles with 12, as well, and both turned out fine! (but they didn't have this overwhelming abundance "[COLOR=blue]found/need[/COLOR]" and both didn't spend that much time in full merge - barely 15 minutes, not 1.5 days... something was rotten in this state). Hmmm...

jasonp 2009-12-17 14:14

It's possible for even a bug-free Lanczos implementation to fail with this error; you have to choose a subset of the remaining matrix columns that can form a small invertible matrix, and if there are very few remaining matrix columns that may not be possible. At least that's what I'd say if I had to make up a half-baked reason why it happened. Do you still have the matrix? I can try to take a look. Agree with Serge that the very long merge time is suspicious.

That being said, these sorts of errors have become gratifyingly rare; in fact this is the first one I can remember seeing in the last year or so. Preprocessing the matrix has really improved in latter-day msieve versions to cut down on Lanczos failures.

Could you post the filtering results for the rerun too?

frmky 2009-12-17 18:57

1 Attachment(s)
Here's the log of the rerun with a more normal amount of relations.

frmky 2009-12-17 19:06

1 Attachment(s)
And for your pleasure, here's a filtering run with an even more absurd number of relations. 32-bit LPs were used (as a test) on a number where 31-bit LPs would have been more appropriate. I just let full merge run for a long time, and it produced a very light matrix. For this run, the filtering source was modified to remove up to 2M cliques at a time rather than 400K, but an earlier run that used 400K had the same results until it was interrupted by an unexpected reboot. I'm of course rerunning this with fewer relations.

jasonp 2009-12-18 02:14

The second case basically ran out of 2-way cliques and still had a huge amount of excess left over. The merge phase only had to build a small matrix, but once it created that matrix it realized the matrix was very large and extremely sparse. So it kept doing more merging to reduce the matrix size. Every time it decided to do 2000 more merges, it forgets about the 500 heaviest ideals. Eventually, after a few days, it ran out of ideals to merge; in that time the forgotten heaviest ideals added up to 18M which is much larger than is necessary.

I guess the full merge should only forget about the heaviest ideals if they really are heavy. That won't make the merging any faster (in fact the opposite will happen :) but at least merging will not end prematurely.

Greg, would you be willing to rerun if I point out the change to make?

frmky 2009-12-18 02:48

[QUOTE=jasonp;199172]
Greg, would you be willing to rerun if I point out the change to make?[/QUOTE]

Sure. This number was meant to be a test anyway. I'm up for more testing.

And look for a PM concerning the 6p323 matrix.

jasonp 2009-12-18 03:47

On line 510 of common/filter/merge.c, replace
[code]
for (i = 0; i < 500; i++) {
[/code]
with
[code]
for (i = 0; inactive_heap.num_ideals > 0 &&
inactive_heap.worst_bin > 100 && i < 500; i++) {
[/code]
It would be nice to not have to spend days getting to the point where the merging would run :(

frmky 2009-12-18 04:53

1 Attachment(s)
I made the change, and it is running now with all 580M relations.

In the meantime, I trimmed nearly 90M relations then ran the filtering. It behaved more normally. The log is attached. In the log, you will see the changes made to remove 2M cliques when >200M relations are left, then dropping to 1M cliques while >100M relations are remaining, then down to the usual 400K. This seems to work well, and significantly speeds the filtering.

frmky 2009-12-30 21:48

1 Attachment(s)
[QUOTE=jasonp;199181]On line 510 of common/filter/merge.c, replace
[/QUOTE]
That change made essentially no difference.


All times are UTC. The time now is 04:56.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.