mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Msieve (https://www.mersenneforum.org/forumdisplay.php?f=83)
-   -   Msieve with GNFS support (https://www.mersenneforum.org/showthread.php?t=5413)

jasonp 2008-04-15 13:23

[QUOTE=bdodson;131579]so -ncr didn't work, of
course, but -nc2 picked up from the last cycle write, and went
directly to the two 14.2M^2 matrices (with and without the first
48 rows; ... uhm, that's "read the cycles; read the relns; re-computed
the quad characters" then went directly to the matrix).
[/QUOTE]
That's as expected; -nc2 builds the matrix and quadratic characters from scratch every time, and writes the result to disk. The removal of the densest rows happens on the in-memory image only, not on disk. IIRC the CWI suite adds in the quadratic characters after the matrix is solved.
[QUOTE]
Does "interrupt" mean something like kill -TERM pid, or kill -TSP pid?
I'm not seeing anything on my "console", as I'm logging in remotely,
running

(./msieve -v -t 4 -nc2 ) > ms2.mat3 &

and logging off. The stdout (stderr?) has only dumped once so far, when
the buffer for msieve.log hit 3200-or-so bits during filtering. After the
reboot, when msieve hit the first checkpoint (!), but msieve.log hasn't said
anything yet ... as I was saying, some of the logfiles posted here say

"starting lanczos, memory xxx",

then nothing for 10days, two weeks, to the lanczos finish.[/QUOTE]
Most folks here run msieve in (the foreground of) a terminal window, and in that case you get a running tally of the number of matrix dimensions solved and the percent of the total completed thus far. If you push the console and error output into files then the system will buffer all those log writes so that you won't be able to see them. The logfile itself doesn't get any of that output because it could be huge.

'Interrupting' a run in this context means SIGTERM or SIGINT (ctrl-C is the usual method).

bdodson 2008-04-16 15:50

[QUOTE=Wacky;131541]You should be somewhat better off than I am on 12,241-. Mine has
[code]matrix is 18720804 x 18721052 (5170.3 MB) with weight 1285924286 (68.69/col) ...[/code]

It is currently at 47.4% and will need most of an additional month to finish.

You should be able to make a reasonable projection if you can see the intermediate progress report on the "console".[/QUOTE]

OK, so now I have a console. Looks like 354 "dimensions" per minute.
(Is that something like the number of rows processed? "Dimension" is
otherwise a reserved term in linear algebra that doesn't apply here. I'm
used to "iterations" from the cwi lanczos, but that's something else yet.)

So anyway, 14.2M/354 minutes would be just under 28 days. Thanks,
that's much clearer (also, I'm about to hit 5%). -Bruce

jasonp 2008-04-16 17:17

[QUOTE=bdodson;131670]OK, so now I have a console. Looks like 354 "dimensions" per minute.
(Is that something like the number of rows processed? "Dimension" is
otherwise a reserved term in linear algebra that doesn't apply here. I'm
used to "iterations" from the cwi lanczos, but that's something else yet.)
[/QUOTE]
I use the D-word because (as I understand it) the distance from a random starting vector to a vector in the nullspace of an MxN matrix can be thought of as a M-dimensional quantity, and every block Lanczos iteration removes a few dozen components of that error. Because the number of error components removed varies with each iteration it's more accurate to report the number of components processed, even though for B-bit vectors the number removed per iteration seems to be very close to (B-0.73) on average.

Chris Card 2008-04-16 17:53

[QUOTE=jasonp;131677]I use the D-word because (as I understand it) the distance from a random starting vector to a vector in the nullspace of an MxN matrix can be thought of as a M-dimensional quantity, and every block Lanczos iteration removes a few dozen components of that error. Because the number of error components removed varies with each iteration it's more accurate to report the number of components processed, even though for B-bit vectors the number removed per iteration seems to be very close to (B-0.73) on average.[/QUOTE]
Isn't distance 1-dimensional?

Chris

jasonp 2008-04-16 18:14

[QUOTE=Chris Card;131681]Isn't distance 1-dimensional?
[/QUOTE]
Oops, yes. I was thinking of the process of removing components from an error vector

bdodson 2008-04-16 18:30

[QUOTE=jasonp;131677]I use the D-word because (as I understand it) the distance from a random starting vector to a vector in the nullspace of an MxN matrix can be thought of as a M-dimensional quantity, and every block Lanczos iteration removes a few dozen components of that error. Because the number of error components removed varies with each iteration it's more accurate to report the number of components processed, even though for B-bit vectors the number removed per iteration seems to be very close to (B-0.73) on average.[/QUOTE]

I'm glad to hear that you have better thoughts on how Lanczos
works than I've heard elsewhere. Any chance that you're considering
the distributed Wiedermann that CADO-NFS and others are looking at
(and/or using)?

Of more immediate interest, if having 4 threads each on two
cores of a quadcore (if that's the correct description of what -t 8
would do?) seems like too many packed into a small number of processors,
would there be a way to get, say 3-4 threads on one (core of one)
quadcore, with another 3-4 on a different quad? Or maybe 2 threads
on each of 4 cores, not all on the same quad? Three weeks would be
better than four, and there seem to be at least six idle cores; more
if I do more of the 10,257- sieving on the other clusters.

Speaking of which, I'm past the halfway point on sieving for 10m257,
and will be reduced to waiting for stagglers before too much longer.
Suppose I'd want to start on another soon, but at (19+3+ ...) Gb/number
of space we'll run out of disk soon. Guess two msieves, one on each
number would get me up to 8 threads, if the disk space holds out ...
-Bruce

bdodson 2008-04-16 18:32

[QUOTE=Chris Card;131681]Isn't distance 1-dimensional?

Chris[/QUOTE]

Yes, but he's reducing the dimension of the ambient space in which
the (linear) distance occurs. -bd

bdodson 2008-04-16 19:11

[QUOTE=bdodson;131687] ...Any chance that you're considering
the distributed Wiedermann that CADO-NFS and others are looking at
(and/or using)?

Of more immediate interest, if having 4 threads each on two
cores of a quadcore ...[/QUOTE]

Actually, isn't this what Wiedermann is supposed to do? Distribute
the matrix computation into pieces (each with the complete matrix iirc),
then run 4 threads on each piece? -bd

jasonp 2008-04-16 20:00

[QUOTE=bdodson;131697]Actually, isn't this what Wiedermann is supposed to do? Distribute
the matrix computation into pieces (each with the complete matrix iirc),
then run 4 threads on each piece? -bd[/QUOTE]
Yes, block Wiedemann allows that; each cluster starts off with a different power of the matrix and computes a completely independent chain of matrix-vector products, with a combining step at the end. I think this is the wave of the future but the theory behind block Lanczos is hard enough for me already. There was another user here who was working on a block Wiedemann implementation, using the matrix-vector product code from GGNFS, but the last time we heard from him he had a long way to go.

The CADO group has not as yet released their code.

I have most of the changes in place to implement block Lanczos with a variable number of dependencies, and as expected the changes were really messy and the assembly code will need a lot of adapting. No timing results yet (on other than tiny jobs), but the overhead of more than 64 dependencies appears quite high, so that the library should probably include multiple block Lanczos subsystems each tuned for a different number of dependencies.

Sheesh, this seems to be an arms race between me tuning the code and everyone throwing ridiculously large problems at it.

PS: Bruce, the scheduling of worker threads is completely up to the OS, barring vendor-specific tools that allow processor affinity to be adjusted manually. Msieve now uses long-lived threads so for a long enough job pinning threads to cores is feasible

xilman 2008-04-16 20:12

[QUOTE=jasonp;131703]Sheesh, this seems to be an arms race between me tuning the code and everyone throwing ridiculously large problems at it.[/QUOTE]Welcome to the real world :wink:

It was ever thus, or at least it has been for the last sixty years anyway. Those who want to use computers /storage / IO / algorithms / implementations inevitably use as much as is available at the bleeding edge. Those who build such resources are always pressured by the users.

I suffer exactly the same pressures in my day job, even though it's a radically different field of endeavour. Job satisfaction is, in large part, being able to meet those expectations. From your achievements so far, I'd estimate that you're getting a lot of job satisfaction.


Paul

Wacky 2008-04-16 23:40

[QUOTE=xilman;131706]Welcome to the real world :wink:

It was ever thus, or at least it has been for the last sixty years anyway. Those who want to use … inevitably use as much as is available at the bleeding edge. Those who build such resources are always pressured by the users.

Job satisfaction is, in large part, being able to meet those expectations. From your achievements so far, I'd estimate that you're getting a lot of job satisfaction.[/QUOTE]

Paul,

I can only hope that Jason is recognizing his deserved "job satisfaction". His efforts certainly justify the recognition. (And, yes, I will be among those who "press the limits" hoping for even more.)

Wacky


All times are UTC. The time now is 22:43.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.