mersenneforum.org Some CADO-NFS Work At Around 175-180 Decimal Digits
 Register FAQ Search Today's Posts Mark Forums Read

2020-05-05, 22:09   #100
EdH

"Ed Hall"
Dec 2009

2×7×239 Posts

Quote:
 Originally Posted by charybdis I've had a few duplicate WUs, and they all seem to be caused by a WU timing out and being resubmitted to another client but then finishing on the original client anyway. I removed all such files before running any filtering on my current job. The expired WUs don't get registered in the logfile when they finish, so they don't appear to affect timing data. . . .
Thanks! Further checking showed I had 99 duplicate WUs that really only contributed about 3.5% to the duplicates, and all duplicate WUs were in the range below 1.9M.

2020-05-06, 02:45   #101
EdH

"Ed Hall"
Dec 2009

2×7×239 Posts

Quote:
 Originally Posted by charybdis I've had a few duplicate WUs, and they all seem to be caused by a WU timing out and being resubmitted to another client but then finishing on the original client anyway. I removed all such files before running any filtering on my current job. The expired WUs don't get registered in the logfile when they finish, so they don't appear to affect timing data. . . .
Indeed, it appears (I kind of remember, now) that I forgot to set wutimeout=43200 in the latest params file from Curtis and used snapshot0 to implement it. That's why no duplicate WUs after 1.9M.

2020-05-06, 13:55   #102
charybdis

Apr 2020

10111102 Posts

Quote:
 Originally Posted by VBCurtis I see no reason for lim's to change when LP changes. Let's leave lim's alone for now. Your test for Q=500k vs 5M is awesome, and conclusive! 2% more uniques for the same sieve time. Let's use Q=5M for this size, pending further investigation (say, 4M or 7M or 10M may be yet better). I agree that test-sieving on ncurves will work- but not a test-sieve at a single Q value. If I did it, I'd want 3 Q's early-mid-late job; in your case, 20-50-80M would convince me. Chances are that what's fastest at one will be fastest at another Q, but I don't think it's obvious that it has to happen that way. I'll post a new params.c180 soon with 31/31LP this afternoon. Progress! As for finding the 2LP/3LP crossover, we only need to test every 5 digits since that's how the params files are organized. The work we're doing could be used to generate e.g. params.c178, which CADO would recognise and use for specifically 178-digit inputs, but the maintainers have already told me not to submit any such params files. Seems like overkill, even for us.
So like this?

Code:
tasks.I = 15
tasks.qmin = 5000000 # subject to further investigation
tasks.sieve.mfb0 = 58 # or 59
tasks.sieve.qrange = 10000

 2020-05-06, 14:10 #103 VBCurtis     "Curtis" Feb 2005 Riverside, CA 431910 Posts Yep! Sorry I didn't get 'round to this last night- composing today's calculus exam took more time and energy than I expected. Seems I now procrastinate just as badly as my students!
 2020-05-06, 21:40 #104 charybdis   Apr 2020 10111102 Posts Here's some more duplication rate data: Code: qmin raw unique 500000 367310494 213177040 3000000 348579594 216387208 5000000 339080413 217856601 7000000 331657087 218727230 10000000 323179335 219755058 13000000 316605494 220579118 17000000 307823056 220314561 20000000 303177219 220593021 For each run I chose a range of Q that took roughly the same CPU-time to sieve, as guessed from the timing data in the logfile. As the clients don't send in their results in exact order of Q, there will have been some slight variations, which presumably explains the dip in unique rels with qmin=17M compared to 13M and 20M. It looks as if 15M or 20M would be a good value to start at.
 2020-05-07, 00:19 #105 VBCurtis     "Curtis" Feb 2005 Riverside, CA 7×617 Posts Let's go with 15M; it's possible that other polys will behave differently, and the "best" curve is rather flat from 13M to 20M. Looks like this lowers our rels_wanted by nearly 20%!! Thanks very much for exploring this parameter! I've explored best starting-Q only for C100-C120 sized jobs, and best was small indeed (under 100k). I'll try boosting starting-Q when I resume C140-sized work later this month.
 2020-05-11, 14:07 #106 charybdis   Apr 2020 1368 Posts 58.3M CPU-seconds of sieving gave me this: Code: Mon May 11 13:07:16 2020 commencing relation filtering Mon May 11 13:07:16 2020 setting target matrix density to 100.0 Mon May 11 13:07:16 2020 estimated available RAM is 15845.4 MB Mon May 11 13:07:16 2020 commencing duplicate removal, pass 1 ... Mon May 11 13:29:01 2020 found 54632218 hash collisions in 211322739 relations Mon May 11 13:29:23 2020 added 122394 free relations Mon May 11 13:29:23 2020 commencing duplicate removal, pass 2 Mon May 11 13:33:23 2020 found 64594317 duplicates and 146850816 unique relations ... Mon May 11 14:39:15 2020 matrix is 17010145 x 17010371 (6703.3 MB) with weight 1760454238 (103.49/col) Mon May 11 14:39:15 2020 sparse part has weight 1587118135 (93.30/col) Mon May 11 14:39:15 2020 using block size 8192 and superblock size 884736 for processor cache size 9216 kB Mon May 11 14:39:58 2020 commencing Lanczos iteration (6 threads) Mon May 11 14:39:58 2020 memory use: 6443.4 MB Mon May 11 14:40:41 2020 linear algebra at 0.0%, ETA 127h46m Probably needs a bit more sieving. Poly score was similar to the last two jobs, so 31/31 seems to be a little slower than 31/32, especially when you account for the more efficient qmin on this job. Worth trying mfb0 = 59 (this was with 58), or would that be unlikely to make a difference? (Also a c168 Homogeneous Cunningham has popped up - what params would you suggest if I decide to do that next?)
 2020-05-11, 14:32 #107 VBCurtis     "Curtis" Feb 2005 Riverside, CA 7×617 Posts We surely can try mfb0 = 59, but we may as well try that on a 31/32 job that we already expect to be quicker. Let's start a new thread for 165-170 digits. I expect this will be on the cusp where 2LP and 3LP are about the same speed. The short answer (compared to C178 settings): Cut poly select by 75% by halving admax and halving P. Use the 31/31 settings you just tried, but with A=28. Given the lower duplicate rate you saw on A=28, I'd drop Q-min to 10M. You could drop lim's a bit, perhaps 10M each, but it doesn't matter much. You already have strong grasp of how to control rels_wanted, but I expect to need 10% fewer relations for 10 digits smaller.
 2020-08-03, 23:28 #108 charybdis   Apr 2020 2×47 Posts I've just sieved two c179s with parameters similar to the consensus from this thread. The first c179 used parameters Code: tasks.I = 15 tasks.qmin = 20000000 tasks.lim0 = 95000000 tasks.lim1 = 135000000 tasks.lpb0 = 31 tasks.lpb1 = 32 tasks.sieve.lambda0 = 1.88 tasks.sieve.mfb0 = 58 tasks.sieve.mfb1 = 90 tasks.sieve.ncurves0 = 20 tasks.sieve.ncurves1 = 13 Poly score was 1.122e-13. 64.9M CPU-seconds of sieving for 321M raw relations, 224M unique. TD=110 produced a 14.8M matrix. The second c179 (almost a c180) used the same parameters except with lambda0 removed. Poly score was 1.000e-13. 64.4M CPU-seconds of sieving for 313M raw relations, 218M unique. TD=110 produced a 15.2M matrix. So the second c179 only sieved about 2% slower despite having an 11% worse poly. Looks like no lambdas might be the way to go here. For the purposes of a hypothetical c180 parameters file, 300M relations would probably be a good target - I sieved a bit more to make the matrices nicer.
 2020-08-04, 00:44 #109 VBCurtis     "Curtis" Feb 2005 Riverside, CA 7·617 Posts That's odd, and means I likely don't understand what lambda does. I thought it was a finer control on mfb, specifically that mfb0 would be lambda0 * lpb0. But 31 * 1.88 is 58.28, meaning mfb0 of 58 is a tighter restriction and that lambda0 isn't doing anything in this list of settings. If you have another similar-sized candidate available, try keeping lambda0 but changing mfb0 to 59 rather than 58. It seems I have some setting too tight, but I'm not clear on which one!
2020-08-04, 02:19   #110
charybdis

Apr 2020

2·47 Posts

Quote:
 Originally Posted by VBCurtis That's odd, and means I likely don't understand what lambda does. I thought it was a finer control on mfb, specifically that mfb0 would be lambda0 * lpb0. But 31 * 1.88 is 58.28, meaning mfb0 of 58 is a tighter restriction and that lambda0 isn't doing anything in this list of settings.
Decided to finally try and figure out what's actually going on...
If I'm understanding the documentation correctly, lambda applies to the *approximation* of the size of the cofactor, based on the estimates of the logs of the norm and the prime factors found in sieving. The cofactor isn't actually calculated unless its approximated log (base 2) is smaller than lambda*lpb; if the cofactor then turns out to be larger than 2^mfb it's thrown out.

The parameter documentation tells us that:
Code:
    # In case lambda0 or lambda1 are not given (or have value 0),
# a default value is computed from mfb0 or mfb1. Since the sieving
# parameters are optimized without lambda0/lambda1, it is better to
# leave them unspecified.
A further trawl through the documentation reveals that a lambda of mfb/lpb + 0.1 is used if we leave it unspecified. The "fudge factor" of 0.1 is supposed to account for errors in the log approximation used in sieving, so that cofactors smaller than 2^mfb don't get thrown out accidentally. The fact the correction is as large as 0.1 suggests the log estimate is rather crude, but I guess that's needed to make the sieving fast.

Having lambda*lpb < mfb, as was the case for some of the jobs in this thread and for your parameter files for small numbers, means that we throw out potential relations below our mfb bound. On the other hand, we don't waste much time calculating cofactors that end up being bigger than 2^mfb. It ends up looking like a job with a slightly smaller mfb except with a few relations that break that bound. I trust that this actually did turn out to be the optimal approach for small numbers, but it seems this might not be the case anymore at ~c180.
Perhaps the proportion of CPU-time spent *calculating* the cofactors becomes smaller as N gets bigger, so the wasted time spent calculating cofactors larger than 2^mfb becomes less of an issue? Would be helpful if someone with a better understanding of NFS could weigh in here.

In summary, I don't think lambda should be thought of as fine tuning of mfb; it's more like a quirk that's necessary because of the way the process of sieving works.

I have a c180 (1186...) in polynomial selection. I'll try mfb=59; tweaking the default lambdas can wait.

Last fiddled with by charybdis on 2020-08-04 at 02:20

 Similar Threads Thread Thread Starter Forum Replies Last Post enzocreti enzocreti 1 2020-03-03 18:38 tuckerkao Miscellaneous Math 2 2020-02-16 06:23 Nick Puzzles 9 2013-02-13 17:17 vsuite GPU Computing 11 2011-02-02 04:47 Corbyguy Software 3 2008-06-09 18:09

All times are UTC. The time now is 19:29.

Thu Sep 24 19:29:07 UTC 2020 up 14 days, 16:40, 0 users, load averages: 1.20, 1.44, 1.69