mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Msieve (https://www.mersenneforum.org/forumdisplay.php?f=83)
-   -   Msieve with GNFS support (https://www.mersenneforum.org/showthread.php?t=5413)

jasonp 2008-04-17 04:41

Seeing people solve problems that were unimaginably huge to me even a little while ago has been completely toxic to my humility.

I finally got a chance to adapt most of the assembly code and ran a medium-size test. Solving a 212k matrix takes 11.25 minutes if done with 128-bit vectors and takes 7.9 minutes if done with 64-bit vectors. I guess it still needs work.

bdodson 2008-04-25 16:36

[QUOTE=bdodson;131687]Speaking of which, I'm past the halfway point on sieving for 10m257, ...

Suppose ... we'll run out of disk soon. Guess two msieves, one on each
number would get me up to 8 threads, if the disk space holds out...-Bruce[/QUOTE]

I managed to whine over a sufficiently long period that our IT people gave in,
and let me have 60Gb on /home. That together with some useful
over-sieving brought the filter time for 10,257- down to 10 hours
(from 44 hours, with very slow disk write/reads, for 3,536+). This is a
harder number (difficulty 257), but sieving was over 160M (40M-200M),
rather than 130M (40-170). Also, Greg's .poly dropped the factorbases
from 80M to 70M, so I ended up with more unique relns on a smaller
factorbase (didn't get a timing comparison on the sieving time, lots of other
user effects; condor "evictions").

New matrix ends up at

[code]
saving the first 48 matrix rows for later
matrix is 13938310 x 13938558 (3804.8 MB) with weight 937778319
(67.28/col)
sparse part has weight 858020543 (61.56/col) [/code]

so 290K fewer rows/cols; and at 61.56 -vs- 61.45 on the sparse part,
just a bit extra density. As I was observing on the post before the
Wiedermann interjection, this gets me very close to eight treads, as in

[code]
%CPU
15341 bad0 25 0 4664m 4.5g 484 R 400 4.1 269:18.79 msieve
7310 bad0 25 0 4766m 4.6g 476 R 373 4.2 13468:40 msieve

or

15341 bad0 25 0 4664m 4.5g 484 R 357 4.1 255:36.68 msieve
7310 bad0 25 0 4766m 4.6g 476 S 354 4.2 13456:43 msieve
16246 bad0 25 0 494m 209m 608 R 100 0.2 109:59.31 gnfs-lasieve4I1
16248 bad0 25 0 497m 210m 556 R 100 0.2 109:50.76 gnfs-lasieve4I1
[/code]

-Bruce (10,257+ c208 is going rather quickly; especially since I took
Richard's suggestion and cut the ranges down to 10K (from 20K, from 50K ...)
just over 3hrs per quadcore core, just under on the Opterons --- fewer
condor evictions (or maybe less other users, hard to keep track). Maybe
I'll hit 12 threads for a bit, if the current 10p257 rate holds.

After 30 console minutes I'm seeing 249 dim/min -vs- 354 for the larger
matrix, I'll try an update after the weekend (when I'm next at my office
console) if the rate picks up (initial timing were also slower on the c252).

bdodson 2008-04-28 13:01

[QUOTE=bdodson;132176]
-Bruce (10,257+ c208 is going rather quickly; ... Maybe
I'll hit 12 threads for a bit, if the current 10p257 rate holds.

After 30 console minutes I'm seeing 249 dim/min -vs- 354 for the larger
matrix, I'll try an update after the weekend (when I'm next at my office
console) if the rate picks up ...[/QUOTE]

Weekend consoles show +8.8% (to 40.8%) on 3,536+ and
+10.2% (to 11.2%) on 10,257-, so the matrix for the harder
number is quicker --- over-sieving wins again. On 10,257+
I collected the qs from a range of width 133M (just 27M left
to go) over six days; sustained on 300+ cpus. Looks like
I'll have to wait a couple of days for the last 27M to re-start
(my jobs get lowest priority here; users with fewer cpuhours
get their jobs scheduled first; can't hardly complain that I'm
not getting enough!). -Bruce

bdodson 2008-04-28 15:57

[QUOTE=bdodson;132176] As I was observing on the post before the
Wiedermann interjection, this gets me very close to eight treads, as in ...
[/QUOTE]

Ah, whilst digging around for some sort of "which core is used" info,
I found "top -H" for show all threads, and got:

[code]
%CPU
7314 bad0 25 0 4766m 4.6g 476 R 100 4.2 6081:16 msieve
23721 bad0 25 0 4661m 4.5g 476 R 100 4.1 3221:59 msieve
23726 bad0 25 0 4661m 4.5g 476 R 100 4.1 3231:59 msieve
7310 bad0 25 0 4766m 4.6g 476 R 100 4.2 8890:38 msieve
7313 bad0 25 0 4766m 4.6g 476 R 100 4.2 5867:24 msieve
7315 bad0 25 0 4766m 4.6g 476 R 100 4.2 5965:16 msieve
23691 bad0 25 0 4661m 4.5g 476 R 100 4.1 4027:32 msieve
23717 bad0 25 0 4661m 4.5g 476 R 100 4.1 3119:01 msieve
[/code]

Jason's not kidding when he mentions long-term threads. Also, I was
muttering (elsewhere) about maybe the threads were geting in each
other's way (as in "why not all 4 threads for both matrices on two
cores of a single quad?"). But that was a mis-reading of the checkpoint
info --- both matrices are getting checkpoints every four hours, so I
was only seeing every 2nd or third. This is with no sieving, but another
user on five cores.

[QUOTE]
I'll hit 12 threads for a bit, if the current 10p257 rate holds. [/QUOTE]

I'm just waiting for that last range of length 27M, which was taking under
24 hours on 300+ cpus. Both matrices will be running for another two
weeks, so a 3rd 4 thread msieve is almost certain, not too far off. -Bruce

Phil MjX 2008-04-28 18:23

hello,

I'd like to report a strange behaviour of the poly selection :
With the 1.34 windows binaries a test with a c97 ran smoothly but, with a c126, the poly selection ran for 4 days (!!) (instead of 8 hours) and a "control C" didn't stop the search after 4 hours !
The problem is that nothing is written to disk during the search and all the work was lost. Another 24h try gives the same problem.
The partial space search option doesn't work and start from scratch, with a message.

All the large coefs seems to be truncated and finished with "0000000".

It was just a test but I'd like the best polys to be saved in a file, like the .cand.all in the ggnfs distribution, at least to get the possibility of an abrupt interruption and to let some comparisons with ggnfs to be done.

Thanks Jason.

Regards.

Philippe

miklin 2008-04-28 18:38

Msieve 1.35
 
1 Attachment(s)
Jason,
Excuse that so long did not write.
Two months were in mountains. It abruptly!!!!!
And now about the main thing.
Has started to test version 1.35.
Also that interestingly I tried example RSA100.
That I have understood the first, it does not work.
And to not be proofless here files.

jasonp 2008-04-29 00:31

Pilippe: none of that functionality (limiting the search, interrupting early, etc) has been implemented. The next version will include a lot of changes and improvements to the polynomial selection, though you shouldn't expect big differences from the pol5 code.

Sergey: there are too many relations for the filtering to handle; it should work if you delete maybe 1 million relations.

miklin 2008-04-29 04:44

1 Attachment(s)
[quote=jasonp;132348]Pilippe: none of that functionality (limiting the search, interrupting early, etc) has been implemented. The next version will include a lot of changes and improvements to the polynomial selection, though you shouldn't expect big differences from the pol5 code.

Sergey: there are too many relations for the filtering to handle; it should work if you delete maybe 1 million relations.[/quote]


Jason,
Here one more example of work msieve1.35 for RSA154 in a file ggnfs_rsa154.txt.

And here that that has turned out on version 1.32 (thus entrance data identical).


[code]
Sun Apr 27 05:11:35 2008 commencing linear algebra
Sun Apr 27 05:11:36 2008 read 4027634 cycles
Sun Apr 27 05:11:58 2008 cycles contain 9816816 unique relations
Sun Apr 27 05:15:10 2008 read 9816816 relations
Sun Apr 27 05:15:27 2008 using 32 quadratic characters above 536870658
Sun Apr 27 05:21:41 2008 read 4027634 cycles
Sun Apr 27 05:21:44 2008 matrix is 4024942 x 4027634 with weight 376537877 (avg 93.49/col)
Sun Apr 27 05:23:08 2008 filtering completed in 3 passes
Sun Apr 27 05:23:10 2008 matrix is 3991550 x 3991748 with weight 374270753 (avg 93.76/col)
Sun Apr 27 05:24:20 2008 read 3991748 cycles
Sun Apr 27 05:24:24 2008 matrix is 3991550 x 3991748 with weight 374270753 (avg 93.76/col)
Sun Apr 27 05:24:24 2008 saving the first 48 matrix rows for later
Sun Apr 27 05:24:26 2008 matrix is 3991502 x 3991748 with weight 288791612 (avg 72.35/col)
Sun Apr 27 05:24:26 2008 matrix includes 64 packed rows
Sun Apr 27 05:24:26 2008 using block size 65536 for processor cache size 4096 kB
Sun Apr 27 05:25:37 2008 commencing Lanczos iteration (4 threads)
Mon Apr 28 12:18:17 2008 lanczos halted after 63129 iterations (dim = 3991496)
Mon Apr 28 12:18:27 2008 recovered 39 nontrivial dependencies
Mon Apr 28 12:18:27 2008 elapsed time 31:06:55
Mon Apr 28 12:18:27 2008
Mon Apr 28 12:18:27 2008
Mon Apr 28 12:18:27 2008 Msieve v. 1.32
Mon Apr 28 12:18:27 2008 random seeds: c74788e7 cfb8cdd3
Mon Apr 28 12:18:27 2008 factoring 8715559889770423424310203883632167048898008133505110868539099203591147550598330383715037180377356985067654557901317213606892162668124627648279771812893257 (154 digits)
Mon Apr 28 12:18:29 2008 searching for 15-digit factors
Mon Apr 28 12:18:30 2008 commencing number field sieve (154-digit input)
Mon Apr 28 12:18:30 2008 R0: -272943704267283933914006624529
Mon Apr 28 12:18:30 2008 R1: 289440566749750559
Mon Apr 28 12:18:30 2008 A0: -1479120383524660035081694220384272
Mon Apr 28 12:18:30 2008 A1: -79121815855669782650267611608
Mon Apr 28 12:18:30 2008 A2: 4929773863590481595325727
Mon Apr 28 12:18:30 2008 A3: -1300115266100187588
Mon Apr 28 12:18:30 2008 A4: -46749049796676
Mon Apr 28 12:18:30 2008 A5: 5753520
Mon Apr 28 12:18:30 2008 size score = 7.262495e-16, Murphy alpha = -5.718998, combined = 4.886474e-15
Mon Apr 28 12:18:30 2008
Mon Apr 28 12:18:30 2008 commencing square root phase
Mon Apr 28 12:18:30 2008 reading relations for dependency 1
Mon Apr 28 12:18:31 2008 read 1995313 cycles
Mon Apr 28 12:18:41 2008 cycles contain 6061614 unique relations
Mon Apr 28 12:21:34 2008 read 6061614 relations
Mon Apr 28 12:22:25 2008 multiplying 9138262 relations
Mon Apr 28 13:37:26 2008 multiply complete, coefficients have about 544.94 million bits
Mon Apr 28 13:37:54 2008 initial square root is modulo 77479
Mon Apr 28 15:17:01 2008 reading relations for dependency 2
Mon Apr 28 15:17:02 2008 read 1995838 cycles
Mon Apr 28 15:17:12 2008 cycles contain 6063340 unique relations
Mon Apr 28 15:20:06 2008 read 6063340 relations
Mon Apr 28 15:20:56 2008 multiplying 9140822 relations
Mon Apr 28 16:36:09 2008 multiply complete, coefficients have about 545.11 million bits
Mon Apr 28 16:36:34 2008 initial square root is modulo 77731
Mon Apr 28 18:15:42 2008 prp77 factor: 76648678625485733611695846561820047569238811477621174717293370669816009242343
Mon Apr 28 18:15:42 2008 prp78 factor: 113707894853030046972514394561611812429142647991789076119120797742828164069199
Mon Apr 28 18:15:42 2008 elapsed time 05:57:15 [/code]

Phil MjX 2008-04-29 08:02

[QUOTE=jasonp;132348]Pilippe: none of that functionality (limiting the search, interrupting early, etc) has been implemented. The next version will include a lot of changes and improvements to the polynomial selection, though you shouldn't expect big differences from the pol5 code.

Sergey: there are too many relations for the filtering to handle; it should work if you delete maybe 1 million relations.[/QUOTE]

Thanks Jason !

My goal was to compare the efficiency of your implementation (it was v 1.35) with ggnfs and there is no problem since it was just for testing purposes. Thanks for this work on the polynomial selection part.

But is this normal that for a c97, the poly search time was respected and that, for a c126, the search seems to run for ever (>96h) instead of a planned 8 hours ?

I think this is a bug and that is the reason of my report.

Regards

Philippe.


PS : the number is 120589172157011794277137847608483215552179876319127843442720258721624648233382859532388590243405874352824255989606819236818821

jasonp 2008-04-29 13:04

[QUOTE=Phil MjX;132369]
But is this normal that for a c97, the poly search time was respected and that, for a c126, the search seems to run for ever (>96h) instead of a planned 8 hours ?
[/QUOTE]
Right now the code sets the limits on leading polynomial coefficient and searches the entire range, regardless of how long it would take. Polynomial selection happens in two stages, and most of the runtime is packed in the first stage, which I have not optimized at all (in fact I've removed the assembly code that the GGNFS version uses, so it's between 2.5x and 4x slower than the GGNFS version). I'm planning the next release in a week or two, and some of the features you ask about will be implemented by then.

Sergey: I will think about how your RSA100 example could be made to work. Also, the RSA154 job has a huge amount of oversieving, you can probably succeed with 20-25% less relations. The filtering in v1.35 is much more effective than in 1.32 when there are many excess relations, and the matrix weight is much lower than v1.32 produced. In fact, the matrix stopped only 80% of the way to completion, so it was probably not dense enough to solve correctly. Try this: in gnfs/filter/merge.c, change TARGET_DENSITY to 75.0 and rerun the filtering

bdodson 2008-05-03 02:34

[QUOTE=bdodson;132176] ... this gets me very close to eight treads, as in

[code]
%CPU
15341 bad0 25 0 4664m 4.5g 484 R 400 4.1 269:18.79 msieve
7310 bad0 25 0 4766m 4.6g 476 R 373 4.2 13468:40 msieve

[/code] [/QUOTE]

Neither -t 12 nor -t 8 ran well; most of the extra threads were
showing "status=Sleeping" rather than Running. But -t 6 does
seem to offer improvment over -t 4, which looks like

[code]
%CPU
10129 bad0 25 0 5326m 5.2g 476 S 591 4.7 65:45.73 msieve
10123 bad0 25 0 4661m 4.5g 476 S 398 4.1 65:02.85 msieve
10105 bad0 18 0 4766m 4.6g 476 S 310 4.2 57:28.75 msieve
[/code]

Uhm, that's 14 threads, four each on the first two matrices,
six on the new one, 10,257+. A monster, by the way, 15.479M^2,
despite having sieved the same region. Switching from x^6 -10
to x^6+10 seems to have been a radical alteration in the roots;
even the ratings of the polyn looks different:

size score = 1.839597e-12, Murphy alpha = 1.635261, combined =
1.152956e-12

for 257+ and

size score = 2.036089e-12, Murphy alpha = 1.321650, combined =
1.395729e-12

for 257- despite having the same "difficulty". The previous matrix
having been 13.938M^2; so the same region seems to have somewhat
over-sieved 257-, but not so much for 257+. Lots fewer relns;
fewer unique relns.

-Bruce
(starting sieving 3,547-, supposed to be the most "difficult"
Childers/Dodson number so far; and a larger sieve region --- we're
not taking chances, and this time oversieving on purpose ...)


All times are UTC. The time now is 22:43.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.