mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Msieve (https://www.mersenneforum.org/forumdisplay.php?f=83)
-   -   Msieve with GNFS support (https://www.mersenneforum.org/showthread.php?t=5413)

Wacky 2008-02-20 05:00

Some "interesting" observations:

I am comparing Greg's just completed run with the one that I have in progress.

He ran 292.69 hours to process "matrix is 9862941 x 9863188 with weight 664531270 (avg 67.37/col)"

I am estimating 543 hours to process "matrix is 13558127 x 13558374 (3675.6 MB) with weight 906838153 (66.88/col)"

The hardware is similar. Dual dual-core Xeon processors each running 4 threads.

I compute the following ratios:

Weight 1.364628263467572
Matrix 1.37464418198254
Number Threads 1.0
CPU Speed 1.127819548872181

Interestingly, if you multiply his execution time by the ratio of weights and matrix size (number of iterations), to result is remarkedly close to my projected execution time.

Given his added processor speed, I believe that it is bus speed rather than processor speed that limits the LA.

Since we have only a limited number of machines in the class necessary to handle a matrix of this size, and since the execution time is non-trivial, are we reaching the optimum filtering point. For problems of this class, have we analyzed the tradeoff if instead using a smaller, but more dense matrix? (Assuming that we still have sufficient memory to avoid swapping)

What about using 128 or 256 bit blocks? I suspect that going from 64 bit to 128 bit blocks would save only 10%-20% in terms of processing reduction (smaller weight and insignificant reduction in matrix size) but would require at least 150% processing time.

frmky 2008-02-20 08:08

[QUOTE=Wacky;126211]
Given his added processor speed, I believe that it is bus speed rather than processor speed that limits the LA.
[/QUOTE]
Supporting this is the observation that while running apparent cpu-bound tasks on this computer, such as ECM or lattice sieving, top reports a load of 4.00, while running msieve's LA top reports 3.75. On average, it seems that 1/4 of a core goes unused waiting for data.

[QUOTE=Wacky;126211]
Since we have only a limited number of machines in the class necessary to handle a matrix of this size, and since the execution time is non-trivial, are we reaching the optimum filtering point. For problems of this class, have we analyzed the tradeoff if instead using a smaller, but more dense matrix? (Assuming that we still have sufficient memory to avoid swapping)

What about using 128 or 256 bit blocks? I suspect that going from 64 bit to 128 bit blocks would save only 10%-20% in terms of processing reduction (smaller weight and insignificant reduction in matrix size) but would require at least 150% processing time.[/QUOTE]

I seem to recall Jason mentioning that changing the final density of the matrix is a simple change in the source. A new computer has now been ordered (that took forever) so I will soon have lots of local CPU time available to try different densities. I'll explore it then.

Greg

fivemack 2008-02-20 10:53

1 Attachment(s)
[quote]I suspect that going from 64 bit to 128 bit blocks would save only 10%-20% in terms of processing reduction (smaller weight and insignificant reduction in matrix size) but would require at least 150% processing time.[/quote]

I would be surprised if it required very much more processing time, since the Core2-class processors claim to be able to do a 128-bit load-from-cache, store-to-cache or XOR in the same time that a 64-bit operation takes.

A very basic blocked matvec takes, on a nearby core2, 32 ticks per matrix entry to operate on __m128i entries and 24 ticks per matrix entry on unsigned long long entries (for a 2^20-square random matrix of density 100 - I haven't tried to model the dense columns), which sounds like a 30% speed-up since you only have to do half as many 128-bit matvecs.

Using a 256-bit structure as the vector entry gave about 48 ticks per matrix entry for the matvec on a 2^20-square matrix and about 58 ticks on a 3*2^19-square matrix; I note that the Aoki et al large GNFS runs from 2003-5 used 256-bit-wide block Lanczos.

henryzz 2008-02-21 09:22

at how many digits does msieve's gnfs become faster than its qs

jasonp 2008-02-21 13:43

[QUOTE=henryzz;126332]at how many digits does msieve's gnfs become faster than its qs[/QUOTE]
The threshold is quite high, because the NFS polynomial selection and sieving are quite basic. I've never explored the answer to that, my guess is around 120 digits. The crossover point between GGNFS and sieve's QS is somewhere between 95 and 100 digits.

Andi47 2008-02-21 14:18

I have already tried 120 digits (was this ~120 digit QS factorization about one year ago) - QS is quite faster. I don't know how much, because I gave up the GNFS factorization.

I got the subjective impression that the difference between msieve's GNFS and QS becomes bigger with more digits.

GGNFS vs. msieve/QS: I usually switch to GGNFS sieving at 100 digits - and of course msieve postprocessing.

henryzz 2008-02-21 17:22

[quote=Andi47;126349]I have already tried 120 digits (was this ~120 digit QS factorization about one year ago) - QS is quite faster. I don't know how much, because I gave up the GNFS factorization.

I got the subjective impression that the difference between msieve's GNFS and QS becomes bigger with more digits.

GGNFS vs. msieve/QS: I usually switch to GGNFS sieving at 100 digits - and of course msieve postprocessing.[/quote]
how long does 100 digits take with factlat.pl

jasonp 2008-02-21 19:07

[QUOTE=henryzz;126367]how long does 100 digits take with factlat.pl[/QUOTE]
I've seen reports of 4-8 hours depending on the machine used. Msieve's QS needs ~12 hours for a random 100-digit job on 2004-era hardware

henryzz 2008-02-23 15:04

how can i tell how many relations the factlat.pl script needs before it will stop sieving

fivemack 2008-02-23 15:21

If you could tell easily in advance how many relations would be needed for a given problem, factlat wouldn't have to do its process of getting some relations, processing them, checking it had enough, getting more ...

A fair guess is that the number of unique relations needed depends mostly on the large-prime bound:

[code]
30 80M-100M
29 45M-50M
28 20M-25M
27 Somewhere around 7-8M
26 Around 2.5M
25 Around 1.5M
[/code]

but there are all sorts of second-order contributions. The two small SNFS jobs I've just done used 1559901 (lp=25) and 2532675 (lp=26)

henryzz 2008-02-23 15:33

in the def-pars.txt it says that lpbr and lpba are both 26 is this what u are talking about
if so i have 8mil relations and that would be a bit much surely for 26


All times are UTC. The time now is 22:31.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.