mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   XYYXF Project (https://www.mersenneforum.org/forumdisplay.php?f=110)
-   -   C305_142_139 (https://www.mersenneforum.org/showthread.php?t=26319)

swellman 2020-12-19 03:03

C305_142_139
 
Ryan Propper is currently attempting to factor C305_142_139 via SNFS. This is the only composite in the XYYXF project (1 < y < x < 151) [URL="http://factordb.com/index.php?showid=1000000000044758166"]with no known factors[/URL].

This job ultimately required 3 LP on the rational side run as a 34-bit job as sieved by CADO (using las).

The invocation of las was (partially) as follows:
[CODE]
./las -poly ./c215.poly -q0 65210000 -q1 65220000 -I 16 -lim0 500000000 -lim1 500000000 -lpb0 34 -lpb1 34 -mfb0 67 -mfb1 98 -ncurves0 26 -ncurves1 21 -fb1 ./c215.roots.gz
[/CODE]

With the c215.poly:
[CODE]
n: 20328474162529416647311982959537843499999146827155841098525615602167851195804786454820231727377603285082682529078490663751950024881559689820074466693727520628346302576066843023544023886449021575509433349099599424248755063838567414089807242454201325599982322964639626679563755005084965722261867838430023049
# 142^139+139^142, difficulty: 308.59, anorm: 3.31e+39, rnorm: 6.77e+56
# scaled difficulty: 311.48, suggest sieving rational side
# size = 5.823e-16, alpha = 0.000, combined = 4.242e-16, rroots = 0
type: snfs
size: 308
skew: 11.8319
c6: 1
c0: 2743582
Y1: -31814999504641997296916177121902819369397243609088
Y0: 2706170815394782263161480417494333876776014302110241
[/CODE]

There some false starts and bumps along the way, as well as some sage advice from several members of the forum. Sieving ultimately took Ryan ~10 days.:max:

Now in the home stretch, switching back to msieve for filtering and further post processing on a 64-core Xeon with TD=130 results in a ~116M matrix:
[CODE]

Thu Dec 17 05:17:14 2020 reduce to 400969081 relations and 396476986 ideals in 6 passes
Thu Dec 17 05:17:14 2020 max relations containing the same ideal: 150
Thu Dec 17 05:18:55 2020 relations with 0 large ideals: 144572
Thu Dec 17 05:18:55 2020 relations with 1 large ideals: 56456
Thu Dec 17 05:18:55 2020 relations with 2 large ideals: 667332
Thu Dec 17 05:18:55 2020 relations with 3 large ideals: 4638677
Thu Dec 17 05:18:55 2020 relations with 4 large ideals: 19434196
Thu Dec 17 05:18:55 2020 relations with 5 large ideals: 51810443
Thu Dec 17 05:18:55 2020 relations with 6 large ideals: 90047701
Thu Dec 17 05:18:55 2020 relations with 7+ large ideals: 234169704
Thu Dec 17 05:18:55 2020 commencing 2-way merge
Thu Dec 17 05:24:03 2020 reduce to 261402996 relation sets and 256910901 unique ideals
Thu Dec 17 05:24:03 2020 commencing full merge
Thu Dec 17 07:05:52 2020 memory use: 31315.8 MB
Thu Dec 17 07:06:18 2020 found 116397412 cycles, need 116255101
Thu Dec 17 07:07:06 2020 weight of 116255101 cycles is about 15113350070 (130.00/cycle)
Thu Dec 17 07:07:06 2020 distribution of cycle lengths:
Thu Dec 17 07:07:06 2020 1 relations: 6616255
Thu Dec 17 07:07:06 2020 2 relations: 8297414
Thu Dec 17 07:07:06 2020 3 relations: 9424454
Thu Dec 17 07:07:06 2020 4 relations: 9281085
Thu Dec 17 07:07:06 2020 5 relations: 9047337
Thu Dec 17 07:07:06 2020 6 relations: 8604374
Thu Dec 17 07:07:06 2020 7 relations: 7990075
Thu Dec 17 07:07:06 2020 8 relations: 7464905
Thu Dec 17 07:07:06 2020 9 relations: 6850961
Thu Dec 17 07:07:06 2020 10+ relations: 42678241
Thu Dec 17 07:07:06 2020 heaviest cycle: 28 relations
Thu Dec 17 07:07:35 2020 commencing cycle optimization
Thu Dec 17 07:11:22 2020 start with 980341833 relations

...

commencing linear algebra
read 113102881 cycles
cycles contain 384567336 unique relations
read 384567336 relations
using 20 quadratic characters above 4294917295
building initial matrix
memory use: 50442.4 MB
read 113102881 cycles
matrix is 113102703 x 113102881 (56946.4 MB) with weight 16093062952 (142.29/col)
sparse part has weight 13797140491 (121.99/col)
filtering completed in 2 passes
matrix is 113100600 x 113100774 (56946.3 MB) with weight 16093003071 (142.29/col)
sparse part has weight 13797122852 (121.99/col)
matrix starts at (0, 0)
matrix is 113100600 x 113100774 (56946.3 MB) with weight 16093003071 (142.29/col)
sparse part has weight 13797122852 (121.99/col)
saving the first 240 matrix rows for later
matrix includes 256 packed rows
matrix is 113100360 x 113100774 (53664.6 MB) with weight 12790337696 (113.09/col)
sparse part has weight 12258232914 (108.38/col)
using block size 8192 and superblock size 608256 for processor cache size 25344 kB
commencing Lanczos iteration (64 threads)
memory use: 74766.4 MB
linear algebra at 0.0%, ETA 2547h50m100774 dimensions (0.0%, ETA 2547h50m)
checkpointing every 50000 dimensions100774 dimensions (0.0%, ETA 2480h38m)
linear algebra completed 161067 of 113100774 dimensions (0.1%, ETA 2241h52m)
[/CODE]

Anyone have access to a MPI cluster willing to help bring this job home? :sos: :batalov:

VBCurtis 2020-12-19 03:36

I'd sieve more until I got that matrix under 100M dimensions, but that won't change the solve time by more than 25%. With Ryan's resources, sieving is cheap compared to 12 weeks on the fancy Xeon... maybe one day of sieving to save 10-14 days of matrix?

ryanp 2020-12-20 04:58

[QUOTE=VBCurtis;566637]I'd sieve more until I got that matrix under 100M dimensions, but that won't change the solve time by more than 25%. With Ryan's resources, sieving is cheap compared to 12 weeks on the fancy Xeon... maybe one day of sieving to save 10-14 days of matrix?[/QUOTE]

I can certainly try to sieve a bit more, but I'm also finding the river of uniques drying up; I'm getting a higher percentage of dupes found. Perhaps I should try with "-I 17" instead of "-I 16"?

charybdis 2020-12-20 05:13

[QUOTE=ryanp;566739]I can certainly try to sieve a bit more, but I'm also finding the river of uniques drying up; I'm getting a higher percentage of dupes found. Perhaps I should try with "-I 17" instead of "-I 16"?[/QUOTE]

Looking at your command line, you've been sieving on the algebraic side, so going back to smaller q and sieving on the rational side (add -sqside 0 to your command line) should get you some more relations.
Moving up to -I 17 or -A 32 would help you get the most out of the improved yield for small q; -A 32 is a 2^16 x 2^16 sieve region so basically halfway between -I 16 and -I 17. Be warned that -I 17 will use something like 36GB of memory; -A 32 will be more like 18GB.

VBCurtis 2020-12-20 05:45

I concur- I'd go with side 0 and -A 32 over changing to I=17. Maybe at Q=200M?
Duplicate ratio will still be very high, but I think it'll be more efficient than going higher on -a side.

ryanp 2020-12-20 18:05

[QUOTE=VBCurtis;566742]I concur- I'd go with side 0 and -A 32 over changing to I=17. Maybe at Q=200M?
Duplicate ratio will still be very high, but I think it'll be more efficient than going higher on -a side.[/QUOTE]

Hm, I already had -sqside 0 in the command, not sure why it wasn't included above. Re-running some more sieving with "-A 32".

Here's a sample command now:

[CODE]./las -poly ./c215.poly -q0 186490000 -q1 186500000 -A 32 \
-lim0 500000000 -lim1 500000000 -lpb0 34 -lpb1 34 \
-mfb0 67 -mfb1 98 -ncurves0 26 -ncurves1 21 \
-sqside 0 -fb1 ./c215.roots.gz -out out.186490000.c215.gz -t 4 -stats-stderr[/CODE]

charybdis 2020-12-20 18:12

[QUOTE=ryanp;566779]Hm, I already had -sqside 0 in the command, not sure why it wasn't included above. Re-running some more sieving with "-A 32".

Here's a sample command now:

[CODE]./las -poly ./c215.poly -q0 186490000 -q1 186500000 -A 32 \
-lim0 500000000 -lim1 500000000 -lpb0 34 -lpb1 34 \
-mfb0 67 -mfb1 98 -ncurves0 26 -ncurves1 21 \
-sqside 0 -fb1 ./c215.roots.gz -out out.186490000.c215.gz -t 4 -stats-stderr[/CODE][/QUOTE]

This is fine if you're running special-q below your initial starting point. Otherwise, you'll be wasting lots of effort, so you should switch to the algebraic side by removing -sqside 0 from the command line. Curtis's tests on this number showed that there wasn't much to choose between the two sides.

ryanp 2020-12-25 17:33

Update: with ~1.34B uniques, I've managed to form a somewhat smaller matrix using target_density=140:

[CODE]Thu Dec 24 13:17:39 2020 commencing in-memory singleton removal
Thu Dec 24 13:18:42 2020 begin with 1341300152 relations and 1130994057 unique ideals
Thu Dec 24 13:32:22 2020 reduce to 858598906 relations and 600312276 ideals in 16 passes
Thu Dec 24 13:32:22 2020 max relations containing the same ideal: 33
Thu Dec 24 13:33:07 2020 reading ideals above 720000
Thu Dec 24 13:33:08 2020 commencing singleton removal, initial pass
Thu Dec 24 15:14:35 2020 memory use: 21024.0 MB
Thu Dec 24 15:14:36 2020 reading all ideals from disk
Thu Dec 24 15:15:27 2020 memory use: 40880.6 MB
Thu Dec 24 15:17:33 2020 keeping 736080120 ideals with weight <= 200, target excess is 4817030
Thu Dec 24 15:19:59 2020 commencing in-memory singleton removal
Thu Dec 24 15:21:40 2020 begin with 858614950 relations and 736080120 unique ideals
Thu Dec 24 15:35:04 2020 reduce to 858474141 relations and 735936294 ideals in 10 passes
...
Fri Dec 25 04:35:04 2020 reduce to 229895200 relation sets and 224306331 unique ideals
Fri Dec 25 04:35:04 2020 commencing full merge
Fri Dec 25 05:52:10 2020 memory use: 27271.7 MB
Fri Dec 25 05:52:32 2020 found 99814363 cycles, need 99574531
Fri Dec 25 05:53:12 2020 weight of 99574531 cycles is about 13941235773 (140.01/cycle)
...
Fri Dec 25 06:29:10 2020 heaviest cycle: 28 relations
Fri Dec 25 06:33:03 2020 RelProcTime: 79459
Fri Dec 25 06:33:43 2020 elapsed time 22:05:00[/CODE]

Now in LA, waiting to get an estimate for time to completion...

swellman 2020-12-31 14:08

“matrix needs more columns than rows; try adding 2-3% more relations”

Ryan tells me he keeps getting the above error message from msieve when it attempts to build the matrix. I believe this is the old “large dataset bug” biting him here. At last check Ryan is trying different TDs and various revs of msieve on an attempt to get past the bug.

Does anyone remember if this big data bug was ever eliminated, or techniques developed to avoid it?

Is using CADO for LA a practical option with a job of this size?

charybdis 2020-12-31 14:57

I've spent a fair amount of time browsing old threads here, and from what I recall, at least some cases of the bug were caused by the GGNFS siever occasionally producing relations with composite factors. Msieve now checks for this, but there is one caveat: a quick read of the msieve code suggests it only checks factors below 2^32 for primality, so a larger composite factor could potentially still cause problems. However, I doubt the CADO siever has such a bug; I've never seen "skipped x relations with composite factors" on a CADO-sieved job.

The bug didn't appear in the filtering run with fewer relations, so it ought to be possible to binary-search using "filter_maxrels" to discover exactly where the bug first appears. If it's due to a corrupt relation that somehow hasn't been caught, this would find it. Otherwise I've got no clue.

CADO linear algebra would presumably work but would be slower and more memory-intensive than msieve.

EdH 2020-12-31 16:33

What about trying remdups4 before filtering? Would that pick out composite relations?

charybdis 2020-12-31 16:54

[code] This is a standalone one-pass duplicate relations remover;
only syntax (a,b:<csv-of-factors>:<csv-of-factors>) is checked and incomplete lines are discarded;
validity of relations is not tested (nor is polynomial needed).[/code]
rather suggests the answer is no.

ryanp 2020-12-31 17:07

[QUOTE=charybdis;567859][code] This is a standalone one-pass duplicate relations remover;
only syntax (a,b:<csv-of-factors>:<csv-of-factors>) is checked and incomplete lines are discarded;
validity of relations is not tested (nor is polynomial needed).[/code]
rather suggests the answer is no.[/QUOTE]

I just ran my relations file through remdups4 anyway, just to make the file smaller, and gzipped it after. I have 1.36B unique rels.

I'm trying two runs on different machines:[LIST=1][*]msieve old revision (R965), TD=140[*]latest msieve, TD=130[/LIST]
Will see how they go.

EdH 2020-12-31 17:25

According to the CADO-NFS documentation (README.msieve), it is possible to perform CADO-NFS filtering on Msieve produced relations. Maybe, then running Msieve on the CADO-NFS results from that point might work.

ryanp 2021-04-01 14:26

After 2+ months of LA, and two days of square root phase we're all done:

[CODE]Wed Mar 31 13:08:30 2021 reading relations for dependency 3
Wed Mar 31 13:08:50 2021 read 49325186 cycles
Wed Mar 31 13:12:51 2021 cycles contain 167252566 unique relations
Wed Mar 31 13:51:41 2021 read 167252566 relations
Wed Mar 31 14:24:46 2021 multiplying 167252566 relations
Wed Mar 31 17:40:33 2021 multiply complete, coefficients have about 5103.31 million bits
Wed Mar 31 17:40:54 2021 initial square root is modulo 528433
Wed Mar 31 21:03:38 2021 found factor: 4342820369934629070129641050057768321597558556674831420859929736305770718265897304519624965158902046467403
Wed Mar 31 21:03:38 2021 reading relations for dependency 4
Wed Mar 31 21:04:01 2021 read 49326365 cycles
Wed Mar 31 21:08:42 2021 cycles contain 167231042 unique relations
Wed Mar 31 21:48:12 2021 read 167231042 relations
Wed Mar 31 22:22:26 2021 multiplying 167231042 relations
Thu Apr 1 01:43:35 2021 multiply complete, coefficients have about 5102.66 million bits
Thu Apr 1 01:43:56 2021 initial square root is modulo 527599
Thu Apr 1 05:11:07 2021 sqrtTime: 114940
Thu Apr 1 05:11:07 2021 p88 factor: 5275421214383200513922797574688119880498697251589940847936777262894189343169550133252051
Thu Apr 1 05:11:07 2021 p106 factor: 4342820369934629070129641050057768321597558556674831420859929736305770718265897304519624965158902046467403
Thu Apr 1 05:11:07 2021 p111 factor: 887310813910120829656915049394563462363864534415286197288227631719949635510944144198346994308391761241694168633
Thu Apr 1 05:11:07 2021 elapsed time 31:55:41[/CODE]

xilman 2021-04-01 19:15

[QUOTE=ryanp;574923]After 2+ months of LA, and two days of square root phase we're all done:

[CODE]Wed Mar 31 13:08:30 2021 reading relations for dependency 3
Wed Mar 31 13:08:50 2021 read 49325186 cycles
Wed Mar 31 13:12:51 2021 cycles contain 167252566 unique relations
Wed Mar 31 13:51:41 2021 read 167252566 relations
Wed Mar 31 14:24:46 2021 multiplying 167252566 relations
Wed Mar 31 17:40:33 2021 multiply complete, coefficients have about 5103.31 million bits
Wed Mar 31 17:40:54 2021 initial square root is modulo 528433
Wed Mar 31 21:03:38 2021 found factor: 4342820369934629070129641050057768321597558556674831420859929736305770718265897304519624965158902046467403
Wed Mar 31 21:03:38 2021 reading relations for dependency 4
Wed Mar 31 21:04:01 2021 read 49326365 cycles
Wed Mar 31 21:08:42 2021 cycles contain 167231042 unique relations
Wed Mar 31 21:48:12 2021 read 167231042 relations
Wed Mar 31 22:22:26 2021 multiplying 167231042 relations
Thu Apr 1 01:43:35 2021 multiply complete, coefficients have about 5102.66 million bits
Thu Apr 1 01:43:56 2021 initial square root is modulo 527599
Thu Apr 1 05:11:07 2021 sqrtTime: 114940
Thu Apr 1 05:11:07 2021 p88 factor: 5275421214383200513922797574688119880498697251589940847936777262894189343169550133252051
Thu Apr 1 05:11:07 2021 p106 factor: 4342820369934629070129641050057768321597558556674831420859929736305770718265897304519624965158902046467403
Thu Apr 1 05:11:07 2021 p111 factor: 887310813910120829656915049394563462363864534415286197288227631719949635510944144198346994308391761241694168633
Thu Apr 1 05:11:07 2021 elapsed time 31:55:41[/CODE][/QUOTE]Nice work!

jasonp 2021-04-16 13:40

FYI I think Greg Childers finally fixed the large dataset bug, it was a buffer overflow in the clique removal and should be fixed in r1038. More generally with the average size of the largest jobs increasing, Msieve's filtering code is long overdue for a refactoring that removes many current limits on the size of filtering jobs.

Congrats on a huge effort!


All times are UTC. The time now is 04:18.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.