20160205, 09:40  #45  
Banned
"Luigi"
Aug 2002
Team Italia
3·1,619 Posts 
Quote:
Luigi 

20160914, 05:08  #46 
Romulan Interpreter
"name field"
Jun 2011
Thailand
10281_{10} Posts 
Ernst, can we have a status on F29?
Just curious... 
20160914, 14:12  #47 
Aug 2002
43·199 Posts 

20160915, 00:50  #48 
∂^{2}ω=0
Sep 2002
República de California
26753_{8} Posts 
Well, I'm doing both the 30M and 32MFFT runs sidebyside on my Haswell quad, so each is running half as fast as it would alone, but it makes it very easy to crosscheck residues, since 30M runs only a few % faster than 32M (.1810 sec/iter vs .1860 sec/iter). I'm running each job 4threaded, even though it gets only the equivalent of 2 cores compute power. Why am I doing this? Because my Haswell system, despite running at stock, has never been completely stable (whereas the very same binary has been running rocksolidly on my slower Mikebuilt Broadwell NUC for over a year, with the only interruptions being power outages ... but note the NUC is doing 1sttime LL tests, not Fermat testing). I average around 1 week between mysteryfails; roughly half the time both jobs start emitting fatal roundoff errors (sign of data corruption) at the same time; ~30% of the fails involve just one run  this is the reason for making both 4threaded, so the surviving run can continue at full 4core speed  and ~20% of the fails are BSODstyle (no UPS on this system, not worth the hassle).
When a run fails due to fatalROEstyle data corruption it will attempt to autorestart for the mostrecent savefile; this succeeds perhaps 1/3rd of the time, suggesting that is the fraction of times the datacorruption occurred in the main residue array as opposed to the auxiliary data tables needed for the FFTmul. (Adding code logic to autoreinit those tables is on my todo list, but not currently a high priority.) Whenever I reboot/restart it gives me a chance to allow whichever of the 2 runs has fallen behind to catch up before restarting the other one. There have been at least 2 (perhaps 3  too lazy to dig thru the savefiles just now) occasions where one run has started emitting nonmatching residues  in such cases I have to halt both and restart from the nearest savefile preceding the start of mismatching residues, since I have no a priori way of determining which one went off the rails. Anyhow, both runs passed 130M iters a few days ago, which is a smidge more than 1/4 done. Total time for the sidebyside runs will be ~3 years, so only 27 months to go. :( Since I plan to do F30 on a manycore system (I'm lookin' at you, Knight's Landing), it is possible those parallel runs will complete before the F29 ones, or not much later. I'll have a better sense of things by EOY, roughly the timeframe I estimate for a halfwaydecent initial AVX512 assemblycode deployment. 
20160915, 05:32  #49 
Romulan Interpreter
"name field"
Jun 2011
Thailand
3·23·149 Posts 
Nice, thanks for sharing that!
That is what I (used to) do when running two LLs in parallel in two Titans  stop both and resume from the last match, as there was no way to know which one failed. I said "used to" because momentarily I stopped my LL testing at all (no resources), and not because I would do it differently now. The old way is still the best. You can save a lot of time, actually (your time and others' time, compared with the version where one test would fail and you don't know and report a bad result, and others need to TC). @Xyzzy: (as said before, this icon is improper named, it has nothing to do with "yuck", i see a grumpy guy sticking out a tongue in defiance, or spite, of another. Long ago I wanted it for myself, but you gave me the lightning rabbit... hey, btw, now with pokemon in vogue, maybe I can make some money if I sell it? Any takers?) Last fiddled with by LaurV on 20160915 at 05:36 
20160921, 11:01  #50 
Banned
"Luigi"
Aug 2002
Team Italia
3·1,619 Posts 
As you probably already noticed, "my" Skylake didn't come.
I had health issues, then had to sell my mom's home, my wife's home was too small and we had to look for another one, sell the old one, buy the new one in a city that is 450 miles away from where I live, prepare to relocate, look for a job and some other things I could put on the unhappy me thread. I want to apologize with the forum and Ernst for offering a helping hand and silently retracting it. The new PC should finally arrive after our relocation is complete, I hope end of October (you will know by then). If help is still needed on November, I hope I will be ready. 
20160921, 21:09  #51  
∂^{2}ω=0
Sep 2002
República de California
26753_{8} Posts 
Quote:
In the meantime, the usefulness of having both runs executing locally on the same (or similar) hardware has become clear to me  If I had a 2nd *well quad, that would double my throughput, but can't justify the expense. So I plan to just have my Haswell complete F29, during which time I will produce AVX512 code which will hopefully bring F30 within reasonable reach, either on a KNL or one of the future desktop systems which will support that dooublethewidthofAVX2 instruction set. Luigi, you go ahead and put your hardware, when it arrives, to use for GIMPS  you definitely have a better chance of discovering a prime that way. :) 

20171114, 02:19  #52 
∂^{2}ω=0
Sep 2002
República de California
5·2,351 Posts 
The 2 sidebyside runs at 30M and 32M FFT I mentioned in my Sep 2016 post both got to iter ~230M on my Haswell in late April with me doing neardaily Res64 crosschecks, but it was a major headache, the Haswell  since powered off except for occasional avx2 code tests and tweaks  being sufficiently unstable that it was often a challenge to get 24 hours of uptime. At that point I moved the respective runs to a pair of faster systems, both physically hosted by our own David Stanfill, a.k.a. user airsquirrels. One system is David's own GPUfilled 32core Xeon/avx2, the other is the Intel Knights Landing box a bunch of us GIMPSers interested in doing/supporting AVX512 code development crowdfunded this past spring. After doing the basic AVX512 assembly code port and some experiments to figure out which FFT length should be run on each of these 2 servers, the 30MFFT run was finished on the Xeon (best periter time there ~24 ms using 32 threads) and the 32M on the
64core KNL (besttime ~35ms/iter using 64 threads) to finish. The Xeon run completed in late July, and the KNL run finished on 2 Sep., both gave these final SelfridgeHurwitz residues (as well as matching fulllength final residue files): F29 is not prime. Res64: BECE1E5FFBE5EC12. Program: E17.0 R29 mod 2^36 = 68650658834 R29 mod 2^35  1 = 17796269382 R29 mod 2^36  1 = 21246329964 Since my original Haswell runs were from a common 32MFFT savefile (as opposed to being run completely independently @30 and 32M) and had numerous restarts as well as several residue mismatches needing backtracking to see which of the 2 runs (which were proceeding sidebyside on my quad Haswell at similar speeds, allowing for easier crosschecking) went "off the rails", dataintegritywise, after the faster Xeon completed its run of the final ~300Miters I further used it to rerun the first ~230Miters, all @30M FFT length. That partial run just completed over the weekend, with 230 Miter result matching those of the Haswellbegun runs. Many thanks to David for the generous gift of Xeon cycles and the IT support. I've posted a tarball of the interimcheckpoint status files and the full final residue file via link added to the OP in this thread. I also have persistent full checkpoint files at 10Miter intervals, but as each such files is 2^29/8 = 64Mbyte, and there are 2^29/(10 million) = 53 such, the total archive is ~3.3GB. The residue files are more or less completely random byte data, i.e. cannot effectively be compressed. I have burned a copy of that archive to a thumb drive as a backup for the one on my HD, and will make it available on request should anyone wish to do e.g. a parallel run validation via rerun of said 10Miter intervals. Last fiddled with by ewmayer on 20171115 at 00:33 Reason: F29 > R29 in SH residues table 
20180112, 07:32  #53 
∂^{2}ω=0
Sep 2002
República de California
5·2,351 Posts 
Since I am currently in the process of adding PRP (as opposed to LL) testing for Mersenne numbers, as well as fast Suyamastyle cofactorPRP testing ('fast' in the 'when starting with a PRPtest residue for the modulus in question' sense) for both Mersenn and Fermat numbers to my Mlucas code, this seems worth sharing here. While going through a box of some old documents today, I came across a folder of overheadprojector transparecies for several talks I gave back in my 6year stint in academia from '93'99. One set of slides was for the following talk I gave at the 1997 West Coast NT conference at Asilomar conference center in Pacific Grove, near Monterey. (My Christmasgifttoself last month was a $60 Epson flatbed scanner, which is paying for itself in all kinds of useful ways.)
There is an obvious small typo on slide 4, the 2nd bullet point should read (boldface highlights the needed fix) "...since (M_p1)/2 is just a string of (p1) binary ones, the Euler base3 test amounts to a series of (p2) squarings and multiplications by 3...": Last fiddled with by ewmayer on 20180112 at 07:33 
20180122, 06:45  #54 
Aug 2006
5,987 Posts 
I never realized you were at Case!

20180624, 03:30  #55 
∂^{2}ω=0
Sep 2002
República de California
5×2,351 Posts 
Update on the sidebyside Pépin tests of F30:
Run #1 using AVX512 build on all 64 processors of the 1.3 GHz "GIMPS KNL" @64M FFT, begun 25 Mar 2017, just passed 1/3rd mark at 360M iterations. At 67.7 ms/iter ETC is 31. Dec 2019 (ugh.) Run #2 using AVX2 build on all 32 processors of David Stanfill's 2.3 GHz Xeon server [E52698 v3] @60M FFT, begun 12 Nov 2017, is 20M iterations behind the KNL one. The best runtime I've gotten on this system is 48 ms/iter, but (for reasons not fully known, best guess is some combo of system load from other lowCPU processes and possibly temps) things more typically run 1015% slower than that. The Xeon also tends to have more outages than the KNL, so let's add another 5% to that, say 57 ms/iter, ETC is midOctober 2019. Unlike my F29 runs on my Haswell quad, there have been no interimresidue mismatches between the above 2 runs. Hooray for ECC memory! One intersting issue I ran into recently with regard to future test of larger Fermat numbers relates to residue shift such as George's Prime95 code uses to effect differentdata firsttime and doublecheck runs of a given Mersenne number at the same FFT length for both runs: Recently finished adding support for this to my Mlucas code. In the context of the LL test for Mersennes, the trickiest part, especially for highly optimized codes which sandwich the carry step between the final pass of the inverse FFT for one iteration and the initial pass of the forwardFFT for the next iteration (and as in my case, combine all 3 steps into the body of a single processing loop), is efficiently figuring out where to inject the shifted 2 which in an unshiftedresidue LL test simply gets subtracted from the 0element of the residue array. In the context of shift, one needs to first efficiently compute which residue word gets the shifted 2, and remember, this is in the context of the crazy mix of 2 different wordsizes (floor(p/n) and ceiling(p/n)) mandated by the CrandallFagin DWT. Then one is typically dealing with a multithreaded decomposition of the above fused FFTpass/carrystep loop, so one needs to figure out which thread is processing said array word. Lots of fun. But, the update of the residue shift count each iteration is simple: it's simply a doubling (mod p), and the fact that p is prime means the odds of the resulting sequence of periteration shift counts will repeat in some kind of shortcycle sequence (or even worse, fall into some kind of limit cycle which is independent of the initial shift, i.e. which would be the same for 2 runs starting out with diffeent ahift values) are small. For Fermatmod arithmetic we have the opposite problem: There is no 2 carry needing to be injected each iteration as there is for the LL test, but the periteration shiftcount doubling is problematic, since our modulus = 2^(2^n) + 1, i.e. our shift count gets doubled mod (2^n), which is guaranteed to yield (and then stay at) shift count s = 0 after at most n iterations. Instead updating the shift each iteration via s = 2*s+1, which one can do by including a scalar multiplier equal to the initial seed of the Pépin test, is little better, e.g. for n = 4, consider the kinds of shiftcount sequences that result from different values of the initial shift: 0,1,3,7,15,15,... 1,3,7,15,15,... 2,5,11,7,15,15,... 3,7,15,15,... 4,9,3,7,15,15,... 5,11,7,15,15,... 6,13,11,7,15,15,... 7,15,15,... 8,1,3,7,15,15,... The most promising avenue I.ve found here is to update the shift count via s = 2*s + random[0,1]. The reason this becomes relevant for Fermat numbers only a few indices above F30 is this: so far I've been able to use dual residueshiftless runs at slightly different FFT lengths to ensure that all but the initial handful of iterations for each run involve different FFT data. For example for F24 we can do one run at 896K and another at 960K or 1M. for F30, as noted above, the minimal FFT length has crept up relative to the nexthigher power of 2 due to roundoff error trends, and each doubling of the modulus length squeezes that lower bound that much closer to the power of 2 FFT length. For F33 I have already put in place a special FFT length of form 63*2^k, but that largeish odd composite radix is relatively inefficient, so if it proves slower than the power of 2 just above it, it will be useful to be able to do both runs at the same powerfo2 FFT length, using the residueshift scheme to ensure differentdataness. 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
P1/P+1 on Fermat numbers  ATH  Operazione Doppi Mersennes  2  20150125 06:27 
What are the Primality Tests ( not factoring! ) for Fermat Numbers?  Erasmus  Math  46  20140808 20:05 
LLT numbers, linkd with Mersenne and Fermat numbers  T.Rex  Math  4  20050507 08:25 
Two Primality tests for Fermat numbers  T.Rex  Math  2  20040911 07:26 
Fermat Numbers  devarajkandadai  Math  8  20040727 12:27 