Go Back > Great Internet Mersenne Prime Search > Software > Mlucas

Thread Tools
Old 2021-03-07, 23:51   #1
ewmayer's Avatar
Sep 2002
Rep├║blica de California

2·5·1,163 Posts
Default Mlucas v20: Preview of coming detractions, or something :)

As I've noted elsewhere, v20 will have p-1 factoring support as its major feature add, with some details such as "fused p-1 stage 1 and PRP testing?" not yet finalized. (PRP-proof support will be in a later follow-on release.)

By way of performance improvements, v20 will have much-improved accuracy for exponents near the FFT-length breakover points. Some background: Back in Mlucas v17, I deployed a streamlined chained-DWT-weights computation in the carry-propagation routines, which gives a ~3-7% speedup for exponents not near an FFT breakover point, and handily allows runtime "dial-in accuracy" as things get near such a breakover point. The problem? Even at the maximal accuracy setting - shortest chain length - said carry routines are a little less accurate than the non-chained ones they replaced, and their associated ROEs tend to be noisier. In fact the original high-accuracy carry macros are still there, but they require a special preprocessor directive at compile time to invoke. I foolishly used the fact the data layouts used by the newer chained-carry macros are somewhat different to keep me from allowing both options to be available at runtime. In v20 the layouts have been harmonized, and for exponents near the limit for each FFT length the code starts with the slightly faster chained-carries in shortest-chain mode, but if it still hits dangerous ROEs (>= 0.4375), it switches to the high-accuracy carries, and only ups the FFT length if those still don't manage to keep the ROEs under control. Moreover, in PRP-test mode, since we do both every-iteration ROE checking and the Gerbicz check, we allow ROE = 0.4375 because the latter check will catch any rare-but-not-unheard-of instances where such a 0.4375 error is really a fatal 0.5625 one in disguise.

At least that's the idea - I am currently subjecting the modified carry code to a really limit-pushing test, a PRP test of M107353937 using a 5.5M FFT. This case was one I inadvertently let slip into a special 5.5M-FFT-run queue I had using gpuowl on one of my GPUs, I meant to limit things to p < 107.3M based on several dozen expos ~107M run that way, but this one got in. I noticed its progress seemed stuck, turnes out gpuowl got ~15% done using 5.5M FFT before hitting its first Gerbicz-check "EE" and going into flailing-around mode. I killed it and finished the run using 6M FFT, so I have a complete set of checkpoint Res64s for cross-comparison purposes.

The Mlucas v20 run is on the oldest of my little Intel NUC minis, a 2-core/4-thread Broadwell-CPU one, running an AVX2 build of the code. So far 100Kiters in, 27 ROEs = 0.4375 but none higher, interim result matches the reference run:

[2021-03-07 15:41:46] M107353937 Iter# = 100000 [ 0.09% complete] clocks = 00:07:48.699 [ 46.8699 msec/iter] Res64: F3FC9D3BEB7987FF. AvgMaxErr = 0.322385850. MaxErr = 0.437500000. Residue shift count = 24310597.

I fully expect the run will hit an ROE > 0.4375 and/or a G-check error at some point, that wll help me properly dial in some of the attendant logic. The actual default limit for 5.5M FFT will be somewhat lower than this, around p = 106.5M for FMA-using builds, a smidge lower for non-FMA.

Further slight accuracy improvements may be possible beyond this, e.g. in the FFT-twiddle and DWT-weights, but the above is the low-hanging-fruit one.
ewmayer is offline   Reply With Quote
Old 2021-03-08, 20:01   #2
ewmayer's Avatar
Sep 2002
Rep├║blica de California

101101011011102 Posts

M107353937 proved a bit too large to handle @5.5M FFT, even with the newly-reinstated HIACC carry code - here are the ROEs > 0.4375 hit in the first half-million iterations (I omit the 0.4375 ones, of which there were over 60):

M107353937 Roundoff warning on iteration 120001, maxerr = 0.453125000000
M107353937 Roundoff warning on iteration 245828, maxerr = 0.439453125000
M107353937 Roundoff warning on iteration 327741, maxerr = 0.450927734375
M107353937 Roundoff warning on iteration 444934, maxerr = 0.500000000000

Each of those triggered a switch to a re-run of the affected 10000-iteration interval @6M, after which I killed the run and restarted @5.5M, by way of data gathering. The Res64 at 500,000, where I finally stopped, matched the one of the previously completed gpuowl run.

Next I tried an exponent just over 107M, 107001617, that got through over a half-million iterations @5.5M before hitting the tripwire:

M107001617 Roundoff warning on iteration 72215, maxerr = 0.437500000000
M107001617 Roundoff warning on iteration 98979, maxerr = 0.437500000000
M107001617 Roundoff warning on iteration 124197, maxerr = 0.437500000000
M107001617 Roundoff warning on iteration 542021, maxerr = 0.468750000000

Another thing I have already added to v20 is per-run counting of ROEs (specifically ones large enough to trigger a switch to the next-larger FFT length) and Gerbicz check errors, for inclusion in the 'errors' subfield of the end-of-run JSON report to the Primenet server. That suggests that an effective strategy in such borderline cases would be to switch to the next-larger FFT length for rerun of an ROE-affected subinterval, but on successful completion of such, to drop back down if the accumulated-error rate remains below some threshold, say no more than 1 such error per million iterations.

Last fiddled with by ewmayer on 2021-03-08 at 20:03 Reason: Mein schpelink ist furchtbar, ja!
ewmayer is offline   Reply With Quote

Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Version v0.7.7 Preview Mysticial y-cruncher 0 2019-01-04 02:35
Where are these coming from? Chuck GPU to 72 1 2018-09-03 03:34
Haswell Preview Benchmark kracker Hardware 543 2015-10-05 05:28
Prime95 version 27.1 early preview, not-even-close-to-beta release Prime95 Software 126 2012-02-09 16:17
Missing mouse-over preview text retina Forum Feedback 1 2011-09-12 15:32

All times are UTC. The time now is 20:50.

Fri May 14 20:50:06 UTC 2021 up 36 days, 15:30, 0 users, load averages: 2.13, 2.14, 2.07

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.