mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   CADO-NFS (https://www.mersenneforum.org/forumdisplay.php?f=170)
-   -   CADO NFS (https://www.mersenneforum.org/showthread.php?t=11948)

fivemack 2019-04-23 22:31

I generated the factor-base file by running the cado-nfs script and letting it call makefb for me, then transcribed the command lines in L2253.wucmd to get raw las command lines.

[code]
/scratch/cado-gnfs/cado-nfs-2.3.0/build/oak/sieve/makefb -poly L2253.jon/L2253.poly -lim 268000000 -maxbits 16 -out L2253.jon/L2253.roots.gz -t 40
[/code]

seems to be the command line for makefb

L2253.poly is just as for gnfs-lasieve4I15e except that it's just the polynomial and no sieving parameters:

[code]
n: 2688333615331433020642446747149440986283678638176205541641754312932820814295074220965678187428410875031545875881257723735692836520162515677425285432734833508071695321927492427322546769031971029
skew: 368024608.71
c0: 123272612786479316312350884349842837614862419023
c1: -8888039873820606651882838453601725169289
c2: -6398405309760975776966814915379
c3: -74102914865935467793635
c4: -344134329264960
c5: 556920
Y0: -21713911810858617860786743761277388982
Y1: 11185023447043546081
[/code]

fivemack 2019-04-23 22:52

I would be a bit wary about running with I=15 and trusting in really large Q; yield drops off as Q goes up, and you do start to get perceptible problems with duplicates (try running a C135 with gnfs-lasieve4I12e to see both these effects)

VBCurtis 2019-04-23 23:35

I did not notice the L2253 in your initial post; I thought you were testing the C207 from the Cunningham project. Your results with I=15 vs I=16 make more sense now, and while I'll try I=15 for the sake of thoroughness (and because the CADO default C210 file uses I=15) I won't expect greatness.

Thanks for breaking down the makefb startup & how you got set up for testing.

henryzz 2019-04-24 08:07

If I recall correctly Bob has suggested running smaller special q with a larger sieve range in the past. Maybe very small q(0-1m?) could be run with 17e. I believe duplicates won't be such an issue due to the increased sieve range(unless the sieve range is related to q size?) and yield will be incredible at tiny qs.

fivemack 2019-04-24 10:27

I=15 larger-scale run
 
[code]
$ /scratch/cado-gnfs/cado-nfs-2.3.0/build/oak/sieve/las -I 15 \
-poly L2253.jon/L2253.poly \
-q0 232000000 -q1 233000000 -lim0 268000000 -lim1 268000000 \
-lpb0 33 -lpb1 33 -mfb0 66 -mfb1 99 -lambda0 2.2 -lambda1 3.2 \
-fb L2253.jon/L2253.roots.gz -out L2253.jon/232M \
-t 40 -sqside 1 -stats-stderr

# Average J=16320 for 52112 special-q's, max bucket fill 0.360714
# Discarded 0 special-q's out of 52112 pushed
# Total cpu time 1718332.50s [norm 1106.12+5086.3, sieving 1158819.1 (821794.0 + 100381.7 + 236643.5), factor 553321.0 (283784.5 + 269536.5)]
# Total elapsed time 49211.82s, per special-q 0.944347s, per relation 0.0194419s
# PeakMemusage (MB) = 13467
# Total 2531226 reports [0.679s/r, 48.6r/sq]
[/code]

That's slightly lower yield (but only by a couple of percent) than I would expect from gnfs-lasieve4I15e; on the other hand it used only about a third as much memory (13467M vs forty jobs at 911M apiece=36440M). I can't say much about timings because I haven't run on that machine at that range; will do 233-234 to get a realistic comparison. It's 20% slower than forty-copies-of-15e was on average over Q=120-126.

But first, trying -ncurves0={20,25,30} and -ncurves1={20,25,30} on 16e for a fixed Q range (probably about 40k, which should get reasonable speed and reasonable statistical significance)

fivemack 2019-04-25 09:29

1 Attachment(s)
That's interesting; the timing on the ncurves runs moves around like a thermometer rather than like anything to do with mathematics. Doing large multi-threaded benchmarking runs on contemporary hardware with all its turbo boosts and the like appears quite a difficult problem.

I tried to run a 17e test; you have to rebuild everything with -DLOG_BUCKET_REGION=17 and this then doubles the size of a number of arrays; a one-threaded job tries to allocate 39GB and falls over on my busy 64G machine:

[code]
malloc_aligned(0x200000,2097152) called
malloc_aligned(0x9d1800000,2097152) called
code BUG() : condition rc == 0 failed in malloc_aligned
[/code]

On a busyish 96G machine it falls over when trying to allocate a second 39GB array; after allocating more swap and kill -STOPping everything else on the machine I actually get some relations

[code]
# 291 relation(s) for side-1 (232000009,20697965)
# Time for this special-q: 441.4965s [norm 0.1389+0.6362, sieving 392.7559 (323.9631 + 29.1809 + 39.6120), factor 47.9654 (29.1407 + 18.8247)]

compare 15e
# 52 relation(s) for side-1 (232000009,20697965)
# Time for this special-q: 33.3656s [norm 0.0152+0.1131, sieving 22.2214 (15.6162 + 1.8664 + 4.7388), factor 11.0159 (5.5952 + 5.4208)]

and 16e (unfortunately for a different special-Q, but time/relations is the relevant metric)
# 127 relation(s) for side-1 (231960013,7709469)
# Time for this special-q: 144.9776s [norm 0.0846+0.3650, sieving 113.0854 (82.6639 + 9.1164 + 21.3050), factor 31.4426 (18.1851 + 13.2576)]

So for this number, which is a GNFS-193, we're seeing
15e 0.64s/r
16e 1.14s/r
17e 1.52s/r
[/code]

I suppose the 2019 way to look at this would have been to hire an r5.12xlarge for a couple of hours, which actually would only have cost a dollar.

henryzz 2019-04-25 10:16

[QUOTE=fivemack;514643]That's interesting; the timing on the ncurves runs moves around like a thermometer rather than like anything to do with mathematics.[/QUOTE]

What are the axes?

Y-axis is seconds-per-specialQ and X-axis is just the sequence of specialQ; I'm running the same set of specialQ nine times with different ncurves0/ncurves1 parameters, so I was expecting nine fuzzy lines at different heights.

fivemack 2019-04-25 11:26

Sorry henryzz, I clicked 'edit' rather than 'quote' and abused my supermoderatorial powers.

There is something in the data under all this noise: changing ncurves1 substantially changes the time for that factorisation phase whilst changing the yield very little, so I'm going to try ncurves1=5,10,15 next

[code]
30.20.2:# Total cpu time 284849.56s [norm 197.30+649.6, sieving 223552.3 (161447.4 + 17982.5 + 44122.5), factor 60450.3 (38705.0 + 21745.3)]
30.20.2:# Total 236498 reports [1.2s/r, 112.7r/sq]
30.25.2:# Total cpu time 289695.86s [norm 198.30+650.6, sieving 222878.8 (161022.2 + 17985.1 + 43871.5), factor 65968.2 (38722.5 + 27245.7)]
30.25.2:# Total 236786 reports [1.22s/r, 112.8r/sq]
30.30.2:# Total cpu time 288201.77s [norm 195.66+647.7, sieving 216766.5 (155109.5 + 17964.5 + 43692.4), factor 70591.9 (38023.6 + 32568.3)]
30.30.2:# Total 236814 reports [1.22s/r, 112.8r/sq]
[/code]

The sieving time is the one with masses of multithreaded memory access so I can see an argument that it is going to be noisier than the other lines; indeed, the first component of sieving time contains all the wiggles in the noisy graph I posted, the rest are much closer to flat within a block.

fivemack 2019-04-25 11:47

Revised 17e numbers
 
The sieving part of the first 17e special-Q was much slower than the others, so my numbers in post 369 are unrealistic.

More plausible numbers (comparison is from single-threaded jobs at I=15,16,17 range=232000000..232000010) (note that these are with a binary built with 17-bit bucket support so quite possibly less efficient than the 16-bit-bucket default):

[code]
grep -E "(this special-q|relation)" ../cado-nfs-2.3.0-B17/L2253.j/1?e.x

15e.x:# 50 relation(s) for side-1 (232000009,175376172)
15e.x:# Time for this special-q: 17.4141s [norm 0.0080+0.0920, sieving 12.5049 (9.0474 + 0.8440 + 2.6135), factor 4.8092 (1.7133 + 3.0959)]

16e.x:# 113 relation(s) for side-1 (232000009,175376172)
16e.x:# Time for this special-q: 49.5885s [norm 0.0600+0.1760, sieving 34.8262 (21.4750 + 2.9080 + 10.4432), factor 14.5263 (7.1553 + 7.3710)]

17e.x:# 274 relation(s) for side-1 (232000009,175376172)
17e.x:# Time for this special-q: 266.6024s [norm 0.1643+0.6817, sieving 216.8890 (157.5922 + 14.4013 + 44.8955), factor 48.8674 (30.8008 + 18.0666)]

[/code]

15 0.348s/r
16 0.439s/r
17 0.973s/r

So for numbers this small 17e really doesn't make sense, which we knew already.

A more interesting question is what kind of yield 17e might give on SNFS jobs, particularly ugly quartics; I will use Fib(1625), quartic SNFS difficulty 271.4, as a test case.

fivemack 2019-04-26 09:45

OK, I am reasonably confident that for 16e lpa=33 lim=268000000 it is worth using ncurves1=15 rather than the default 25 (basically, each curve factors 25% of the remaining usable composites).

[code]
ncurves1 yield total time t_factor0 t_factor1
5 185576 270537.39 40156.7 6422.3
10 223265 275086.68 39581.3 11120.9
15 234399 277763.44 39406.4 16285
20 236494 277285.09 38674.7 21739.8
25 236782 289479.05 38263.5 27276.3
30 236810 288675.62 37848.9 32619.5
[/code]

ncurves1=15 comes from fitting lines to t_factor0+t_factor1, adding the average non-factor time, and then optimising expected-yield / total-expected-time.

fivemack 2019-04-26 15:53

17e is not really enough to make difficulty-270 quartic SNFS practical.

The yield is a bit higher, but you have to go to 34-bit large primes to get more than one relation per Q, and collecting a billion relations at nine CPU-seconds per relation is not a plausible job.

[code]
I lpr time/rel rels/Q
15 32 10.09365591 0.06975
15 33 5.61544898 0.1225
15 34 3.367857143 0.21
16 32 11.4795212 0.18275
16 33 6.513562066 0.32425
16 34 3.922789209 0.54675
17 32 28.05035398 0.452
17 33 15.07548578 0.844
17 34 8.857269553 1.432
[/code]

(this is with rlim=268M alim=67M lpba=30, because big quartics are very asymmetric towards the rational side)

With -t4, 15e takes 7579MB, 16e takes 25610MB, 17e takes 89361MB


All times are UTC. The time now is 19:12.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.