mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Blogorrhea > kriesel

Closed Thread
 
Thread Tools
Old 2020-01-07, 01:15   #1
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·11·149 Posts
Default Off topic

This thread is for posts that I make that don't fit the GIMPS reference character of most threads in this blog.

This is not a discussion thread. Don't post here. Comments posted in this thread may be moved or removed without notice or recourse. Post in the discussion thread at https://www.mersenneforum.org/showthread.php?t=23383

  1. Intro (this post)
  2. About me https://www.mersenneforum.org/showpo...50&postcount=2
  3. Personal bests in GIMPS https://www.mersenneforum.org/showpo...51&postcount=3
  4. Landau's fourth problem (shortly to reappear) https://www.mersenneforum.org/showpo...52&postcount=4
  5. Blogger blunder backup blackout blues https://www.mersenneforum.org/showpo...72&postcount=5
  6. Fermat numbers https://www.mersenneforum.org/showpo...65&postcount=6
  7. What is this beast called? https://www.mersenneforum.org/showpo...48&postcount=7

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-01-16 at 16:33 Reason: Added the beast
kriesel is offline  
Old 2020-01-07, 01:23   #2
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3×11×149 Posts
Default About me

I'm not really comfortable with putting info about myself under "hall of fame"; seems presumptuous at best. Not sure where else to put this, so here it sits, at least for now.

It's titled simply "About me", because "who is this guy kriesel anyway, and why does he think he knows anything about GIMPS or Mersennes or anything related to them, after becoming a mersenne forum member only since March 2017 while GIMPS has been around since early 1996" is way too long.

Among my skills are great attention to detail and persistence, technical writing, programming & testing, system and network management, organization of data, and being misunderstood and confusing or annoying people without intending to. I'm also pretty good with a chainsaw and other tools. Although my formal education was in mechanical engineering, with some skills in electrical engineering, programming, etc, I was a floater. After my few years in an automotive research startup, where I was the only degreed engineer, instrumentation engineer, electronics designer, programmer, and also did some HR and safety, I switched employers and led computerization of engineering analysis, drafting, and machining, and was also IT manager for a small department, and was a member of the UW-Madison Network Advisory Group. I've done real time motion control programming in macro assembler, of the 6.4 meter / 21-foot x travel 4-axis (xyz & rotate) robot I designed, for lights-out 24/7 remote OS-integrated optical disk jukebox operation, and other technically demanding and sometimes safety-critical work. (Imagine a sub-mm 3kw electron beam aimed at someone's head, with less than 1mm of metal, 2mm of water, and 3mm of plastic between them. Inside the radiation shielding the xray flux was lethal dose at about a minute. Or designing and producing space rated hardware with zero chance of snagging or puncturing an astronaut's suit. Or designing extreme precision mechanisms for operation for decades in ultra high vacuum where no oil or grease or fingerprint is allowed. Or creating a package of hardware and documentation and tools to be shipped to the South Pole for installation there by a physics professor. Two sets of documentation; one for him to read on the planes and lose, and one in the crate with the hardware.)

As someone raised on a dairy farm by Depression-era parents, and trained as an engineer, waste and ignorance are the enemy, the war is lifelong. Knowledge is good, clarity is good, accuracy is good, efficiency is good, more of each is better; iterate.

As far as Mersennes go, it all started when I read of one of Slowinski's discoveries in a science magazine. (You know, those things people made out of flat pieces of ground up dead trees coated with white clay and marked with various colored inks, and.sold at retail stores or through subscriptions via the mail, actual delivery by the USPS) In retrospect, the article was woeful, not mentioning the Lucas Lehmer test or any other number theory term.

"TF79LL86GIMPS96gpu17" means I started with trial factoring in 1979 on a programmable calculator with <50 program step limit so all odds < sqrt(Mp) were trial factors, later wrote my own simple slow integer based combined LL and TF with small-prime sieving of candidate factors software "ll" beginning around 1986 in Fortran and later c, joined GIMPS as soon as I heard of it in 1996, provided Woltman a list of residues produced from "ll" as data he requested for confirming prime95's accuracy, & expanded into gpu computing in 2017. At the peak, I was as I recall ranked #2 in GIMPS for assignments completed and #3 for computing time, back when such levels were easier to reach because there were fewer participants.

Some of my current GIMPS activities, beyond production TF, P-1, LL, PRP, and DC at the GIMPS wavefronts:
  1. I routinely beta test, and alpha too, code by others, and occasionally write some of my own code
  2. Benchmarking on multiple versions of hardware new and old, multiple versions of the software, also routine.
  3. I help reach milestones when I can (while trying to fight the temptation to poach)
  4. My financial contributions to the effort are mostly via ebay and my utility bill, and comparable in scope to the nominal 2019 fundraising, as are those of many other participants. I also contribute a great deal of time and effort.
  5. I make projections of where the project will advance to in years and decades ahead, suggest future directions for the application software, and create and maintain the bug and wish lists for gpu applications.
  6. I have tried Mersenne number TF, ECM, P-1, LL first and DC, PRP first and DC computation types on cpus, and all except ECM on gpus..ECM of suitable size is not available for gpus, and is pointless above 20M. I have all the rest running on multiple devices in multiple systems continuously.
  7. Tuning is mostly done and documented along the way, per the many gpu and software combinations I have in place; it's a moving target.
  8. Resurrecting old hardware was interesting. There's more to do. Most that I had 2 years ago are cost prohibitive per unit throughput per the comparative cost analysis I prepared and posted. It's better to spend on new hardware, than on electricity on ~2005 or older hardware. I think my oldest functional hardware is from circa 1995. Even an ancient used $25 gpu or repurposed cellphone can far exceed that old cpu's performance. My main laptop, on which this response is written, is 9 years old and still cost competitive and running prime95 now since it is fully depreciated. I have newer that are dedicated to GIMPS.
  9. Search for and test everything that may be useful for advancing GIMPS wavefront progress and get new participants up to speed, write batch scripts, document everything, share it in a book-sized blog that's mainly reference material for GIMPS. This blog https://www.mersenneforum.org/showth...922#post521922, begun May 2018, is over 200 reference posts and growing. Have a look around. Tactful suggestions for other additions or corrections are welcome.
  10. Do Windows builds for gpuowl and share them plus initial test results, help output etc, on its thread
  11. I was GIMPS QA coordinator for prime95 v18, 19, & 20 (after the discovery of the V17 shift bug). I had the privilege of getting good input from Brian Beesley on how to efficiently QA prime95 then, and the help of a squad of perhaps a dozen alpha tester volunteers.
  12. Help the GIMPS software authors, with testing & well documented bug reports, suggestions, supplementary documentation, fielding user questions so the authors are more free to write software for us all. Very occasionally, I contribute code snippets. (Some code originally intended for CUDALucas went into Aaron Haviland's CUDAPm1 fork.)
Among my current subprojects are
  1. exploring the limits of P-1 software, with a hope of perhaps someday improving the software such as CUDAPm1;
  2. establishing run time scaling for the remainder of the mersenne.org exponent range for each available useful computation type in each commonly used software (useful excludes ECM);
  3. establishing via scattered sampling runs, the various computation types run successfully throughout the mersenne.org suitable parameter space (for example, P-1 selects suitable bounds and completes; TF to suitable bit levels works; PRP completes accurately, etc.)
  4. determining the useful limits of the specific models of gpu that I own. (Some gpus can run higher exponent P-1 than others, but the bounds may not be adequate above some model-specific limit, so while it can be run, it would be better to run it somewhere else instead)
  5. refuting with reliable and verifiable empirical results, various dubious claims made on the forum of certain participants about various exponents corresponding to mersenne primes with little proof or basis. (No information is bad; false information is worse.)
  6. developing reference material that I find useful or addresses questions of new participants, and sharing it on this blog.
  7. Following up on the predict Mx threads unresolved guesses and assorted dubious claims made in other threads. Current guessed or claimed primes that are testable with current technology will conclude in 2021.
  8. Ensuring that each million-exponent-value bin in the work distribution map for months or years ahead of the first-primality-test wavefront has at least one successful double-check completed. Currently the focus is on LL up to 200M. (The LL DC can be run on some old gpus I have that are not capable of running PRP via gpuowl.) PRPDC ahead of the wavefront is progressing well, with the caveat that I have switched to PRP & proof & Cert where feasible.
  9. Testing high exponents of certain forms.
It's not really related to GIMPS goals, but I have also dabbled in Proth and Wagstaff numbers at times. And Laundau's fourth problem has been bugging me lately.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2020-12-16 at 21:55
kriesel is offline  
Old 2020-01-07, 01:39   #3
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

114658 Posts
Default Personal bests in GIMPS

The reference info accumulated in this blog seems like an accomplishment, and has reached book size.

highest ranking reached in top producers:
#3 by P90-years
#2 by exponents LL tests completed and reported
(that was a very long time ago, 1997?, well before gpu usage was begun in GIMPS, or the PRP test adopted, and with the aid of a university small department's PC fleet in addition to my own, and when there were far fewer GIMPS participants than there are now)

More recently, from https://www.mersenne.org/report_top_...ate=&end_date= & similar:
All work types #6 Nov 28 2020
Trial factoring #4 Nov 7 2020
P-1 factoring #1 Feb 21 2021
All first primality tests #17 Nov 14 2020
LL first primality test #24 or better (Feb 15 2021)
PRP first primality test #14 Nov 6 2020
All double-checks #6 Dec 23 2020
LL double check #10 Feb 9 2020
PRP double check #3 Oct 6 2020

highest TF bit level completed in p<109: 86, https://www.mersenne.org/report_expo...0000029&full=1
(higher in project OBD; M3321933179 from 2^85 to 2^87 for example, 7076 GhzD)
highest GHzD TF bit level attempt completed: 7914.88, https://www.mersenne.org/report_expo...0000029&full=1

largest factor found by TF: factor 9661162819438172378751929, exponent 887184833, k 444842190758108, 83 bits, 953. GhzD https://www.mersenne.org/report_expo...exp_hi=&full=1

largest factors found by P-1:
exponent 101837677 factor 517050261390286174198141026167350661719 (128.604 bits)


exponent 102535079 factor 408813325950827664880164653670091063297 (128.265 bits) with k value 1 993529 092374 462718 657312 653312 = 216 × 33 × 13 × 31 × 137 × 9689 × 18367 × 328381 × 349187. Found 2021-02-06 14:42:16 UTC on a Radeon VII gpu running gpuowl V6.11-380 on first-test wavefront exponents with default 1M, 30M bounds.

exponent 101273659 factor 375324496211112358228921438201039041097 (128.141 bits)

exponent M102590167 factor 288065752848991018613790448294738221721 (127.760 bits)

exponent 102429427 factor 241863833036789833487894403779180959 (117.542 bits)
with k value 1180636464151995273232829877 = 3 × 11 × 13 × 241 × 6,323 × 372,473 × 751,901 × 6,448,567

exponent 97875719, larger factor 6248386517510450606520265730011457, 112.3 bits, k value 31 920003 149659 879415 650912 = 25 × 233 × 503 × 1427 × 2591 × 120907 × 19 039091 https://www.mersenne.org/report_expo...7875719&full=1


These are still small compared to what it takes to get onto the top 100 largest P-1 factors found, at ~131.2 bits: https://www.mersenne.ca/pm1user/1 or top-500 at ~117.9 bits. The current #1 in that list is a 173.2 bit factor.

highest k value for a factor found, see preceding entries, M102535079 k value 1993529092374462718657312653312 (100.653 bits)

highest exponent P-1 factored: exponent 901000031, factor 66276073654639633298220609, k value 36779173903624223 = 37 × 7538693 × 131857303 https://www.mersenne.org/report_expo...1000031&full=1 (with prime95 v29.8b6 on an i7-8750H Windows 10 laptop, in 57 days)

highest exponent P-1 completed both stages to GPU72 labeled bounds on mersenne.ca
https://www.mersenne.org/report_expo...9999937&full=1 (On Tesla P100, Google Colab, several sessions)

highest exponent primality double checks
LL:
200000069
my highest LL DC that was confirmed by someone else's TC: 145500007

PRP:
187046537
332897017 https://www.mersenne.org/report_expo...2897017&full=1 sort of a DC; residue types not matched, but both returned composite

My highest first test PRP that has been dc verified: 187046537
By someone else: 121596589
PRP with proof, Cert by someone else: 320019937

highest exponent primality first-test:
LL 402143717 https://www.mersenne.org/M402143717
PRP 655685803 https://www.mersenne.ca/exponent/655685803


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-02-21 at 19:50 Reason: updated for top producer ranks
kriesel is offline  
Old 2020-01-07, 01:44   #4
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·11·149 Posts
Default Landau's fourth problem

Are there an infinite number of primes of the form x2+1 for integer x? How seductively simple a question that seems.
Posed in 1912 as one of Landau's 4 problems, it remains unanswered still.

Per https://en.wikipedia.org/wiki/Landau%27s_problem, several bounds have been placed around it.
Iwaniec: at most two prime factors;
Ankeny: x2+y2 with y=O(log x) rather than y=1;
Deshouillers and Iwaniec: greatest prime factor at least x1.2 rather than x2;
Conversely, the Brun sieve establishes an upper limit on the density of such primes, such that ALMOST all numbers of the form n2+1 are composite.

Since odd x squared is odd, x2+1 for odd x>1 is composite.
A few low values:
x l=x2+1
1 2
2 5
3 10 even
4 17
6 37
8 65 composite (semiprime)
10 101 prime
12 145 semiprime
14 197
A list of the first 10,000 such primes is at https://oeis.org/A002496/b002496.txt, which includes a lot of references also.
A few others are
http://nntdm.net/papers/nntdm-22/NNTDM-22-4-73-77.pdf
http://mathworld.wolfram.com/LandausProblems.html
https://www.mersenneforum.org/showthread.php?t=21600
http://devalco.de/quadr_Sieb_x%5E2+1.php (with a few short programs)

Wolf searched for up to 1020 in 2008; https://arxiv.org/pdf/0803.1456v1.pdf
revised 2009 https://arxiv.org/pdf/0803.1456v2.pdf
revised 2010 https://arxiv.org/abs/0803.1456
Wolf and Gerbicz computed a table up to 1025 in 2010.

Grantham followed with up to 6.25x1028 in 2016. His effort is described at https://drive.google.com/file/d/0B_r...1UcXdQUEU/view and makes use of sieving.
Per Grantham (slide 13) "If x2 + 1 is divisible by an (odd) prime, that prime must be 1 mod 4. We want to sieve by these primes."
That list is in turn sieved by primes up to sqrt(x).
THAT list is sieved by sieve of Eratosthenes. Triple level sieving.
So to get to 1028, sieving would be no higher than 1014, 107, and 103.5.
A sieve bit map of 4k+1 covering 1014 is 1014/4 = 2.5 1013 bits, 3.125 1012 bytes,
rather large compared to common system memory or disk space. So the sieving is probably done to a limited extent or in segments. Grantham does the computation in parallel distributing the large sieve to multiple processors.

The known counts of such primes to powers of ten up to 1028 are listed at https://oeis.org/A083844

I wrote a simple, direct-primality testing program (no presieving) in perl. It uses single-word math through 1014 and then switches to necessary but slower multiword math, and becomes computationally expensive around x2+1 = 1020.
It produces matching counts of primes to that level, so it's not proven to be incorrect, and may even be correct, albeit slow and inefficient.

Let j be any natural number, out to positive infinity.
Considering each 10(j-1) < x2+1 < 10j -1 decade range as a separate interval, the interval size is 0.9 x10j,
the density of primes of form x2+1 is ~0.55 10-j and the count for the interval is ~0.495 10j / j.
If this holds up the number line, the sum of interval counts is infinite. That's a big if, which no one's been able to prove or disprove.

If however, the number of such primes is finite, there could be only a finite number of decade intervals containing such primes. Excluding a decade as empty is much less computationally intensive than counting the primes in the decade; finding just one such prime suffices.
For the total number of near square primes to be finite, it is not enough to show a single 10-fold empty interval, or even many of them.
It requires that the nonempty intervals be finite in number, and that requires the empty intervals be infinitely many.

A modified version of my little program to search for possibly empty decades has been run up through x=102000 and found such a prime in each decade, so no empty decade in any tried in x2+1 up to <103999.
Again, it becomes computationally expensive, limiting it to around x=102000. Faster hardware and faster software could move that a bit, but the run time scaling is VERY steep.

Like so many other things, this topic too is apparently not immune to a bit of crankery.
https://www.quora.com/Since-Landaus-...ng-by-a-thread
https://www.scienceforums.net/topic/...ns-for-newton/
https://math.stackexchange.com/quest...roblem-correct


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf landau4.pdf (41.3 KB, 149 views)

Last fiddled with by kriesel on 2020-04-18 at 13:47
kriesel is offline  
Old 2020-01-07, 08:04   #5
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3×11×149 Posts
Default Blogger blunder backup blackout blues

It started innocently enough; move the "About me" post from the hall of fame thread to the off topic thread. Well, threads are ordered by date when first posted, which put About me ahead of the intro/table of contents of the off-topic thread. Simple, copy the text, delete the post, add it to the end as a reply, right? Wrong. Deleting the first post deletes the thread.

Well, I thought, luckily Xyzzy backs up the forum. Contact him to have the thread restored from backup. He had seemingly dismissed my concerns about separate backup for the blog long ago, so his backups must be pretty good, right? If a server disk crashes and the whole forum needs to be restored, yes. But he tells me it does not support restores at the post or thread level (or presumably the blog level); only a full forum restore.

So, off to look for usable copies of the lost content, through various systems' open browsers, drives, caches, Google cache, the internet archive (Wayback Machine), etc. Google cache only contained about one sentence of the lost content. The Internet Archive only captures the top level page of the mersenneforum.org site, nothing beneath. My local systems yielded nothing useful. I had previously "cleaned off duplicate content". Gone: https://www.mersenneforum.org/showthread.php?t=24735 Off topic
https://www.mersenneforum.org/showpo...01&postcount=2 Landau's fourth problem
  • They are NOT backed up on the internet archive (apparently nothing below the top mersenneforum.org page is archived there; the mersenneforum.org robots.txt is set to prevent site crawls below the main page, so that's by design)
  • They are NOT available from backup from Xyzzy (already asked and got an unwelcome reply, that granularity is not available, only the whole mersenneforum blog is retrievable from monthly backup)
  • They are NOT available by the back button in my web browser in any still open tabs on mersenneforum.org content (multiple systems checked)
  • They are NOT available in my web browser cache (multiple systems checked, they seem to have been flushed by more recent page views) or the content was never there, just the javascript that pulled the now-deleted content from somewhere
  • They are NOT available in Google cache (only the url and about the first sentence is retrievable there, while I'm after the full post content)
So, made a makeshift tedious backup manually of what remained. Searched for partial-website-backup utilities for Windows. Tried: from https://listoffreeware.com/best-free...e-for-windows/
  • cyotek webcopy (instead of just the blog, tries to download the whole mersenne forum and other web sites too)
  • offline downloader (downloads nothing, seems unable to cope with a network share as destination)
  • free download manager (lacks needed features shown on list page; apparently user must individually specify each page to download separately)
  • HTTrack site copier seems to work and not be too awful to use, downloads a bit too much, probably because of linkage through the "similar pages" part automatically added to the bottom of all forum pages, eventually errors out and comes to a stop, after apparently fully copying the whole blog and attachments and whatnot and a moderate amount of extra pages.
Next task is to re-create the rest of the lost content, regarding Landau's fourth problem.
So, bloggers beware, backup elsewhere.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2020-01-07 at 08:28
kriesel is offline  
Old 2020-10-04, 16:54   #6
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·11·149 Posts
Default Fermat numbers

Fermat numbers, Fn=22^n+1, are prime for n=1 to 4. Fermat numbers five through 32 have been determined composite through finding factors or performing Pepin tests for primality. These numbers get large very fast. F33 is ~8 billion bits long and is beyond the current capability to Pepin test in reasonable time except perhaps on the faster supercomputers. https://en.wikipedia.org/wiki/Fermat_number

Ernst Mayer's well written article in Odroid is worth a read for general background. https://magazine.odroid.com/article/...tical-history/

For F33 size is ~233 bits as a compact integer; primality testing involves repeated powers of 3 mod F33. https://en.wikipedia.org/wiki/P%C3%A9pin%27s_test Such huge numbers are processed using ffts. FFTs are usually most efficient when based on powers of 2, or other sizes that are quite smooth (having only very small factors), and no larger than necessary for accuracy.
Lists of 7-smooth possible sizes are here: https://www.mersenneforum.org/showpo...75&postcount=7
Recently gpuowl has been extended to use fft lengths that are 11- or 13-smooth.
If I've read the source correctly, Mlucas uses even 31-smooth.

Maximum usable bits/ DP word declines as fft length or exponent increase. For example, in gpuowl P-1, a 24M Mersenne requires 1280K, for 18.31 bits/word, while a near 1G Mersenne requires 57344K fft length, 17.03 bits/word.
CUDALucas, gpuowl, mprime, etc all need about 56M words or more for fft multiplication up to 109 or 230.
The number of usable bits/word declines slowly as operand size/fft length grows. More space is needed for accurate carry accumulation, or something like that.

To estimate how large an fft length would be required for F33, consider that at a gigadigit exponent in gpuowl V6.5-84 which supported such large Mersennes, it was 16.5 bits/word, declining at about 0.7 bits/word per factor of ten on exponent. https://www.mersenneforum.org/showth...384#post546384, which extrapolates to ~16.2 bits/word at F33 size.


Primality testing

Per https://www.mersenne.ca/credit.php?f...30&worktype=LL the effort to primality test Fermat numbers is:
F30 62610 GHzD
F29 14175
F28 3206
This seems to be slightly steeper than the ~p2.1 rule I've seen in Mersenne primality tests. (That would give 58923 for F30 extrapolating from F28's figure. The above figures seem to be p2.14.)
Extrapolating, F33 would be about 82.14 times longer than F30, or if the code existed, and the hardware resources were adequate, and there was not significant performance drop due to data size (less effective caching, or whatever). 82.14 x 62610 = 86.3 x 62610 = 5.4E6 GhzD.
There's a Lucas-Lehmer test completed but not double-checked for M999999937 using CUDALucas. It took ~six months on the fastest available gpu at the time, an NVIDIA GV100 and is listed as 47431 GhzD. See https://mersenneforum.org/showpost.p...54&postcount=8
Comparing these figures indicates Fermats take ~13% more effort than Mersennes of equivalent exponent. In the following I assume the effort is the same.

I think the Pepin test for Fermats would take similar time per iteration and have similar iteration count to Mersenne PRP for operands of the same size.

Gpuowl does not handle Fermat numbers.
Gpuowl does not support the required fft length, which would be around 8G/(16.2 bits/word)=506 Mwords in DPfloats. The current max gpuowl fft length 120M handles a little over 231, and does not support gigadigit Mersennes, 3.32G exponent, much less 8G exponents. It would need to be extended to at least ~506M, if not 512M or more.
Ernst's statement about Pepin testing F29 in 30 and 32M ffts confirms this estimated requirement. https://www.mersenneforum.org/showpo...7&postcount=48
Also F30 at 60 & 64M; 68 & 57 ms/iter https://www.mersenneforum.org/showpo...5&postcount=55

CUDALucas is a gpu implementation of the Lucas-Lehmer test for Mersenne primes on NVIDIA gpus. It does not support a sufficiently large fft length for 233-bit tests, Mersenne exponents larger than 231 -1 in its signed integer exponent variable, or Jacobi check for the LL computation. It does not implement PRP, or GEC check, or the Pepin test for Fermat numbers. CUDALucas uses the CUDA fft library, so as I recall, the fft length is hard capped at 256M words, for 1d DP ffts, until NVIDIA decides to extend the library. Then it would require someone extend CUDALucas code to match. Or a top layer of Karatsuba could be used, doing 3 muls and some adding and subtracting to accomplish double-length squaring and work around the current CUDA library limit. And in either case, implementing the Pepin test.

Mlucas supports testing Fermat numbers. https://www.mersenneforum.org/mayer/README.html The largest Mersenne number testable with it (ignoring run-time concerns) is that with the largest unsigned 32-bit prime exponent 4294967291, which is near F32. It also appears, based on a light reading of the V19 source code, to include support for 512M fft length and F33. And the extension to 512M fft is described here; large fft benchmarks on Knights Landing here.

When Ernst Mayer tested F29 and F30, he made two runs and saved files from both at 10M-iteration intervals. Duplicating runs may be avoidable by implementing Robert Gerbicz' error check for Pepin tests. https://www.mersenneforum.org/showthread.php?t=22510

Maximum fft length implemented in mprime/prime95 depends on cpu type; 32M for SSE2, 50M for FMA3, 64M for AVX512. These fall far short of what F33 would require. https://www.mersenneforum.org/showpo...74&postcount=8

A Radeon VII on Linux with ROCm driver and good overclockable-to-1200Mhz gpu memory can reach 510GhzD/day in gpuowl at 5M fft length. However, on Windows, I've observed considerable performance rolloff at much higher fft lengths (14-48M). And Ernst Mayer has reported markedly lower performance at 8M fft length than at 5M on Linux.
Assuming that extended precision for the exponent and other effects provide ~half the performance of 5M fft, 5.4E6 GhzD/ (510/2 GhzD/day) =~ 21190 days =~ 58 years on Radeon VII. From the rolloff I've seen, and the considerable further fft length extension needed, that may be optimistic.
A very rough estimate of the gpu ram required is >12GB; maybe 16GB on a Radeon VII is enough. https://www.mersenneforum.org/showpo...93&postcount=7

The Knights Landing 64-core system on which F29 was partly tested was based on KNL Xeon Phi, a many-core Intel cpu design, discontinued in 2018. KNL's stated peak DP performance spec was almost comparable to that of a Radeon VII. But its memory bandwidth is ~2.5 times slower. https://en.wikipedia.org/wiki/Xeon_Phi

The 32-core AVX512 on which F30 verification was planned to be concluded gave a completion time that is consistent with a 2 year entire run for F30, so for F33, centuries. https://www.mersenneforum.org/showpo...1&postcount=75

Estimating save file size by Mersenne gpuowl scaling, F33 would be about 1GB, manageable. https://www.mersenneforum.org/showpo...7&postcount=22
https://www.mersenneforum.org/showpo...3&postcount=52 in fact says 64MB each for F29. That would scale to 1GB for F33. Saving every 10M iteration checkpoint of 2 parallel runs as Ernst did for F30 would add up. 1GB x 2 x 233/107 = 1.72TB, manageable.


Factoring

Before tackling such a monster test, a lot of factoring is probably in order.
MMFF is a possibility for TF. So is Mfactor.
http://www.fermatsearch.org/index.html lists ecm software.
http://www.fermatsearch.org/factors/programs.php lists several varieties.
http://www.fermatsearch.org/download.php has links but to older versions (prime95 v28.10 for example, while current is v30.3)
Not sure which would support such a large number; prime95 would not because of fft length.
F33 is 22^33+1, or in k,b,n,c form for worktodo files, 1,2,8589934592,1
A worktodo line would be
ECM2=1,2,8589934592,1,B1,B2,curves_to_run[,"comma-separated-list-of-known-factors"]
Pminus1=1,2,8589934592,1,B1,B2[,"comma-separated-list-of-known-factors"]
PRP=1,2,8589934592,1[,how_far_factored,tests_saved][,prp_base,residue_type][,"comma-separated-list-of-known-factors"]
After I select bounds rather arbitrarily, for time estimating on a 4-core AVX512 capable i5-1035G1 processor,
prime95 V30.3b6 interprets these F33 related worktodo lines as involving p=2:
ECM2=N/A,1,2,8589934592,1,250000,25000000,1
Pminus1=N/A,1,2,8589934592,1,250000,25000000
PRP=N/A,1,2,8589934592,1,100,0

Trying F32, it interprets these also as involving p=2;
ECM2=N/A,1,2,4294967296,1,250000,25000000,1
Pminus1=N/A,1,2,4294967296,1,250000,25000000
PRP=N/A,1,2,4294967296,1,100,0

Trying F31, crashes prime95 when test, status selected. Retrying them individually,
ECM2=N/A,1,2,2147483648,1,250000,25000000,1 this line crashes prime95
Pminus1=N/A,1,2,2147483648,1,250000,25000000 1.5 years estimated
PRP=N/A,1,2,2147483648,1,100,0 this line crashes prime95

Trying F30: p=1073741824, on i5-1035G1, which supports 64M AVX512 ffts and ~1.168G max Mersenne exponent, prime95 V30.3b6 yields these estimates:
ECM2=N/A,1,2,1073741824,1,250000,25000000,1 13 days for one curve,
Pminus1=N/A,1,2,1073741824,1,250000,25000000 4 days,
PRP=N/A,1,2,1073741824,1,100,0 1 year 2 weeks.
Extrapolation of those run times to a hypothetical capability not yet included would be 82.14 = 85.6 times longer, or 3 years/ecm curve, 11 months for P-1, 90 years for PRP test.
The PRP time estimate could be taken as a rough estimate for Mlucas run time on the same hardware. As I recall Ernst Mayer indicated on the Mlucas readme page that Mlucas performance was comparable to mprime on AVX512.

GP2 on factoring attempts on F29 vs F33: "it takes about 120GB of memory and about a week to do a single ECM curve for F29." on one core; later clarified to 118 hours (~5 days). https://mersenneforum.org/showpost.p...28&postcount=2 Others comment they've run multicore on 32GB ram in less time.

Mersenne number P-1 run time scaling runs on Gpuowl V6.11-3xx and Radeon VII show number of buffers declining such that, extrapolating from 400M to 1G, it appears stage 2 would drop below 1 buffer and fail, by 5G exponent. The Radeon Pro VII won't help here, because while it's faster at DP, it has the same amount of gpu ram. See the Radeon VII attachment at https://www.mersenneforum.org/showpo...5&postcount=17
Estimated run time is, by extrapolation from 500M and 1G, 0.95 and 4.02 days respectively for both stages, c p2.08. Rough extrapolated F33 P-1 time is then 4.02 days x 8.592.08 = 352 days = 11.6 months = 0.965 years. Existing P-1 code I'm aware of (before Gpuowl v7.x) has essentially no error checking. It would need to be rewritten substantially to include a considerable amount of error detection and recovery for such a long run. Estimated stage 2 save file size is ~4 GB.

In Gpuowl V7, P-1 factoring is combined with PRP computations in stage 1, lowering the combined cost for Mersenne numbers https://mersenneforum.org/showpost.p...0&postcount=19, and I think that means that GEC on the PRP may protect its computation somewhat. (If nothing else, unreliable hardware could be detected by a GEC error in the PRP computations.) A Jacobi check is done at start or load and at 1M iteration intervals of P-1 stage 1. https://mersenneforum.org/showpost.p...2&postcount=82
Preda describes stage 2 error resistance in https://mersenneforum.org/showpost.p...7&postcount=98
In https://mersenneforum.org/showpost.p...6&postcount=31 he seems to say that enough gpu memory for at least 24 buffers is required.
Its stage 2 save file size is tiny. Here's an example for 957156667 stage 2 in progress:
Code:
OWL P2 2 957156667 8310000 200235109
Following are comments by Ernst Mayer regarding appropriate further factoring for F33, quoted by permission:
Quote:
based on the TF-to-date and the massive memory needs of ECM, I think only a very deep p-1 attempt is warranted prior to starting the primality test. By deep I mean several years long, perhaps something like this:

Year 1: single fast GPU does deep stage 1. If no error-check mechanism a la Gerbicz is available for that, we'd want 2 separate machines doing identical-bounds stage 1s, to be able to cross-check residues.

Year 2: stage 1 residue gets distributed to multiple users/machines, each of which does one or more work units consisting of powering the stage1 residue to the primes in a given range and doing a gcd to check for a factor. The p-1 code would need to be specially tuned to minimize memory usage: each F33-mod residue is 4GB (in 16 bits/DP form), so we couldn't keep more than 3 such in a Radeon VII's working memory, for instance. I think 32GB is likely the absolute minimum HBM that will be needed for stage 2 work.

The first step should probably be to try p-1 on F31, which has a known factor found via TF: p = 46931635677864055013377, which has p − 1 = 233 · 3 · 13 · 140091319777, i.e. needs just a very shallow stage 1 but a deep stage 2 (or a stage 2 using just the largest factor of p-1, 140091319777) to reproduce via p-1.
The proper P-1 bounds remain to be determined. The run times from Ernst imply larger bounds than I used to estimate run times. I don't know of any software currently up to the task of P-1 factoring for F33. Presumably Mlucas and/or gpuowl will be enhanced to meet the specific need.

Ernst writes about doing F33 P-1 stage 1 twice with different systems as a reliability check, before using the stage 1 residue for parallel stage 2 runs on distinct B2 ranges. Stage 1 parallel runs would probably also use differing fft lengths as an error check. It will be interesting to see how many of the error checks used in Gpuowl v7.x P-1 factoring on Mersenne numbers (PRP/GEC interim results multiplied together; Jacobi symbol check), or from prime95/mprime, or otherwise (post 9, post 10) are productively adapted to P-1 factoring or Pepin testing of Fermat numbers.

Numerous questions are posted in the PRP for F33 thread at https://www.mersenneforum.org/showpo...5&postcount=13. No pertinent responses yet as of 2021-3-2.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-03-02 at 19:59 Reason: updated for date, added paragraph on P-1 error checking
kriesel is offline  
Old 2021-01-16, 16:31   #7
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3×11×149 Posts
Default What is this beast called?

I looked for something related to this by starting with the OEIS for Mersenne exponents for known primes https://oeis.org/A000043, but didn't find anything that seemed related to this exploding sequence.
(If this is already a known sequence, please PM me with a link, and I'll add it and adapt terminology here to match.)

Consider the sequence M* (i) such that for element i>0, it is recursion i times of M*i= 2^p-1 where p=M* (i-1). That is, a recursive expression for computing the position in the list of Mersenne primes.

It doesn't seem of any practical use, since we have already found all 3 terms i>0 possible to find.
Let the seed i=0, M*0 = 2. Then for
i=1 M*1 = Mp2 = 3 (the first Mersenne prime has exponent 2 and value 3)
i=2 M*2 = Mp3 = 31 (the third Mersenne prime has exponent 5 and value 31)
i=3 M*3 = Mp31 = 2216091 -1 is the thirty first in the ordered list of Mersenne primes
i=4 M*4 = Mp(2216091 -1) = ? The fourth element is the Mp31'th Mersenne prime, sure to be never found by humans, and sure to be enormous if it exists.

This sequence's terms grow far faster than even Catalan's C(n+1) = 2Cn - 1.
Finding M*4 requires finding over 216,000 additional Mersenne primes. Assuming the conjecture about Mersenne primes' distribution is correct, M*4 is uncomputable, since we'll run out of matter on which to perform and store the computation and run out of lifetime of the universe to test the immense number of increasingly huge composites or primes.

Estimating the size of M*4, by Mn+1 conjectured as on average having exponent 1.47576 times Mn, and it seems reasonable to expect a LOT of averaging in over 216,000 intervals;
M*4 =~ Mp31 * 2^(1.47576*(2216091 -1 - 31)) = ?
=~1065049.87 * 10 ^ (.30103*1.47576 * 1065049.87 - 1.47576*9.33*.30103)
=~1065049.87 * 10 ^ ( 1065049.52 -4.14)
=~10^ ( 1065049.52 + 65049.87 -4.14 )
=~10^ ( 1065049.52 + 65045.73 )
=~10^ ( 1065049.52 ) * 1065045.73
=~10^ ( 1010^4.813242 ) * 1010^4.813219
The larger term in the product above is large compared to numbers approximating a googolplex, 10^10100, which are unstorable in integer form.
For comparison, the estimated number of subatomic particles in the known universe is about 1080, or 1089 including photons; no more than about 10^101.95. So storage of work in progress becomes a problem. Very early, as a percentage of completion of identifying M*4. Execution time is also a problem, due both to distances across the storage volume (c l = t =~ 1/frequency), and to the enormous number of iterations required for primality testing of enormous exponents.

M*4 is far larger than Catalan's C4 or C5:
C4 = M127 = 170141183460469231731687303715884105727 ~1.7x1038 (prime)
C5 > 1051217599719369681875006054625051616349 ~10^1037.7 (is C5 prime ?)
M*4 is prime by definition if it exists, since it is defined as an element in an ordered list of Mersenne primes.

https://primes.utm.edu/notes/faq/NextMersenne.html
https://primes.utm.edu/mersenne/


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-01-18 at 14:11
kriesel is offline  
Closed Thread

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Off topic - board games on the web robert44444uk Lounge 3 2019-03-05 08:51
Side Topic: 'It's Not a Toom-ah' R.D. Silverman Math 68 2015-11-15 04:21
Topic of Peepholes Friendship :) coffee1054 Lounge 7 2012-02-17 03:38
Off-Topic: Spurious IRQ Interrupt? moo Hardware 4 2005-03-26 19:38
AMD vs. Intel topic Pjetro Hardware 11 2002-11-04 21:00

All times are UTC. The time now is 06:49.

Thu Mar 4 06:49:16 UTC 2021 up 91 days, 3 hrs, 1 user, load averages: 2.74, 2.98, 2.84

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.