![]() |
Lanczos with >64 dependencies
1 Attachment(s)
The attached is a replacement for the code in common/lanczos, and implements Lanczos iteration capable of finding >64 dependencies. If you feel brave, unzip the attached into common/lanczos and add common/lanczos/lanczos_vv.c to COMMON_SRCS in the makefile.
Right now it's configured for 128 dependencies, and you can change VBITS in lanczos.h; you can compile for 64,128,192 or 256 dependencies. Note that the patch for local memory allocation is not included, because that was added later. Also, savefiles generated with different VBITS are incompatible with each other. SSE2 instructions are used for VBITS=128, and MMX is used for all others. The code works but is currently very messy, and some of the assembly code still needs to get converted. The results I have so far are as follows: - on a 1.86GHz core2duo running winxp, a 212k matrix solves in 7:52 with VBITS=64 and about 11:00 with VBITS=128. The latter time goes down to 8:22 if you use the experimental port of gcc 4.3 to minGW, as opposed to the older crappy gcc 3.4.5 that is the default. - on a 2GHz opteron system running linux, a 217k matrix solves in 11:10 with VBITS=64 and solves in 9:59 with VBITS=128, so on this machine it actually is a little bit faster to find more dependencies. So I guess the first thing to find out is whether your opteron and core2duo systems actually benefit from this patch on big problems, and whether the speedup is worth the extra memory (10-15% more for VBITS=128). Please don't use this patch for production work, it will need many changes before it's ready for that. |
[QUOTE=jasonp;135931]
So I guess the first thing to find out is whether your opteron and core2duo systems actually benefit from this patch on big problems, and whether the speedup is worth the extra memory (10-15% more for VBITS=128). Please don't use this patch for production work, it will need many changes before it's ready for that.[/QUOTE] On the Barcelona, VBITS=128 seems to help only a little. Here are the completion times in hours (calculated from a few minutes runtime) for a 1.62M square matrix for version 1.36 and the new code with VBITS set to each value for 1, 2, 4, and 8 threads. [CODE] 1 2 4 8 1.36 15.8 9.7 6.5 5.6 64 17.4 10.5 6.7 5.8 128 16.5 9.7 6.6 6.2 192 22.3 14.9 11.9 11 256 24.5 15.9 12.8 13.2 [/CODE] Greg |
Thanks. Presumably the times would get a little better once all the assembly code (i.e. for the vector-vector operations) was converted, but it looks like the most you can expect is parity with the unmodified code.
OTOH, as the number of dependencies goes up, the block size may go down (to keep the wider vectors in cache), which would increase the runtime |
Uhm, so how much trouble am I in when the msieve matrix step
reports [code] linear algebra completed 15554069 out of 15479770 dimensions (100.5%) [/code] Any chance that running a bit longer will give dependencies? Or is it time to restart, from the filtering step? -Bruce |
[QUOTE=bdodson;136684]Uhm, so how much trouble am I in when the msieve matrix step
reports [code] linear algebra completed 15554069 out of 15479770 dimensions (100.5%) [/code] Any chance that running a bit longer will give dependencies? Or is it time to restart, from the filtering step? -Bruce[/QUOTE] I regret to inform you that I've never had a successful linalg run that gets beyond 100%. You might want to let it get 1% further out of hope for a miracle, but then restart from filtering and maybe increase the matrix-weight parameter. |
[QUOTE=bdodson;136684]Uhm, so how much trouble am I in when the msieve matrix step
reports [code] linear algebra completed 15554069 out of 15479770 dimensions (100.5%) [/code] Any chance that running a bit longer will give dependencies? Or is it time to restart, from the filtering step? -Bruce[/QUOTE] Nobody has managed to get dependencies out of a matrix when this has happened to them. Do you still have the logs from when the matrix was created? It's possible the matrix was too sparse, or there was a hardware error. You can give it a little longer, or restart from the last checkpoint, but eventually I suspect you should restart from the filtering step. Msieve v1.36 should yield a matrix with >= 63 nonzeros per column. Sorry this had to happen... |
[QUOTE=fivemack;136685]I regret to inform you that I've never had a successful linalg run that gets beyond 100%. You might want to let it get 1% further out of hope for a miracle, but then restart from filtering and maybe increase the matrix-weight parameter.[/QUOTE]
I have seen it with the CWI solver as well. It happens either when the matrix is too sparse or somehow (usually through an oversight) duplicate entries remain in the final matrix. |
[QUOTE=bdodson;132769]Leaving aside the first 48 matrix rows,
3,536+ "sparse part has weight 874437208 (61.45/col)" 10,257- "sparse part has weight 858020543 (61.56/col)" and 10,257+ "sparse part has weight 952964006 (61.56/col)". ..... PS -- with the first 48 rows, that was "weight 1266010783 (88.97/col) sparse part 898702280 (63.16/col)" "weight 1239088034 (88.90/col) sparse part 879366766 (63.09/col)" "weight 1382748757 (89.33/col) sparse part 983086477 (63.51/col)" [/QUOTE] This is the 3rd one; 10,257+, not particularly sparse. I also had a matrix that I re-ran with the CWI suite. I'll hang in for a while (Greg seems less worried). -Bruce |
[QUOTE=bdodson;136736]This is the 3rd one; 10,257+, not particularly sparse. I also had a matrix
that I re-ran with the CWI suite. I'll hang in for a while (Greg seems less worried). -Bruce[/QUOTE] Can we therefore assume that the LA for 10,257+ failed? How is the other work progressing? Enquiring minds want to know. :smile: |
[QUOTE=R.D. Silverman;136761]Can we therefore assume that the LA for 10,257+ failed?
How is the other work progressing? Enquiring minds want to know. :smile:[/QUOTE] I'll hang on for a week-or-so (to c. 110%) before restarting 10,257+, unless someone is certain that the msieve dimension counter passing 100% is conclusive for matrix failure. The other two matrices running here are 3,547-c242 at 70% and 7,311+ C258 at 10%. Greg's running 2,949+ C189 (after a p57 by ecm in final pretesting) out at Cal, Fullerton. 7,313+ C264 just finished (over-) sieving; I'm shipping 189M unique relns later today. 7,313- C248 has finished 5M of the 30M range of qs (20M - 320M), and 6,392+ c262 is just about done with ecm. Looks like both NFSNET and Childers/Dodson are due to pick some new next numbers. On the issue of whether p54 ought to be regarded as an ecm miss, the recent one from 10,257-; the fact that there was a composite cofactor makes any issue nearly moot --- cf. the p57 of 2,949+ above, we had to sieve the cofactor anyway. With the exception of the last 20M range of qs added in over-sieving on 313+, calendar sieving time was under 10 days, with a 30M range of qs every 24 hours, except when I ran a last 2*t50 on 313-. That also took 24 hours, 3100 curves with B1=260M, p60-optimal. So 6*t50 = c. t55 takes 3*24 hours; fully 30% of the sieving time (identical hardware, two beowulf clusters, Opteron and Intel/quadcore). To "remove to 80%" (Peter's term) an ecm p55 factor would take 60% of the sieving time (for 2*t55), clearly too long. Even tests like 7*t50, or 8*t50, as recently run, are up near 1.5*t55, past 45% of the sievetime. Especially on these very large snfs's, where a composite cofactor is still the same snfs, the current balance between ecm and sieving (much less matrix) suggests that we ought to accept an occasional p54/p55, as within the probability expectation of ecm --- running a bit past 62% may make sense, running to 80% (to "remove" factors of that size) doesn't. If/when there are lots of pc/grid cycles, on machines that don't support sieving, we may be spoiled by having extra ecm available, outside of constraints on the runtime comparison with sieving; but that seems no longer to be the case here (with condor sharply reduced on public machines). -Bruce (Thanks for asking.) |
[QUOTE=bdodson;136766]
<snip> Especially on these very large snfs's, where a composite cofactor is still the same snfs, the current balance between ecm and sieving (much less matrix) suggests that we ought to accept an occasional p54/p55, as within the probability expectation of ecm (Thanks for asking.)[/QUOTE] We are, as they say, in violent agreement. |
| All times are UTC. The time now is 22:04. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.