![]() |
The exponent is just a string of bits used by the powermod function. There would be no problem to extend it to any value, including to make it string. Powermod should work the same fast (for compatible sizes of the exponent) if it is string or integer.
|
There's quite a lot of work left yet below 10[SUP]9[/SUP], much less 2[SUP]32[/SUP] (4.29x10[SUP]9[/SUP]).
Full gputo72 trial factoring depth for 2 primality tests saved goes up about 3 bits per doubling of exponent. But it goes up at a slower rate at higher exponents. Following figures are for gtx1070; others are similar 25M 71 bits 50M 74 100M 77 200M 80 400M 83 800M 85 1G 86 <- mersenne.org exponent assignment limit 1.6G 88 3.2G 90 <-mfakto factor depth limit 4G 90.5 <-mfaktx exponent current limit; mersenne.ca assignment limit; mfakto factor depth limit 92 bits 5G 91 (here and larger exponents are extrapolated) 8G 92 16G 94 24G 95 <-mfaktc factor depth limit 95 bits Run times per exponent drop about in half per bit depth with double exponent, but next bit depth takes twice as long, so double exponent takes ~4 times as long to run to its full depth as an exponent half that large. Plus there are more of them, nearly double. For those who can't resist factoring stratospheric exponents, there's Factor5, ready to go. But the most fertile ground for finding primes is the smaller available exponents. Factoring is a means to that end. |
[QUOTE=kriesel;490535]There's quite a lot of work left yet below 10[SUP]9[/SUP], much less 2[SUP]32[/SUP] (4.29x10[SUP]9[/SUP])[/QUOTE]By my rough calculations, there's about 50 million THz-years of TF work just from M10[sup]9[/sup]-M2[sup]32[/sup]. Assuming you have a single GTX 1080 that does ~1000GHz-days/day, that's about 18 billion years. :shock: Make it 20 if you want to finish off the Primenet range first. :smile:
|
[QUOTE=James Heinrich;490538]By my rough calculations, there's about 50 million THz-years of TF work just from M10[sup]9[/sup]-M2[sup]32[/sup]. Assuming you have a single GTX 1080 that does ~1000GHz-days/day, that's about 18 billion years. :shock: Make it 20 if you want to finish off the Primenet range first. :smile:[/QUOTE]
All of PrimeNet amounts to ~190 such gpus. [url]https://www.mersenne.org/primenet/[/url] I was just on ebay bidding on one. EERY! (Next up, a heat shield so the sun doesn't melt it?) |
[QUOTE=kriesel;490535]Full gputo72 trial factoring depth for 2 primality tests saved goes up about 3 bits per doubling of exponent. But it goes up at a slower rate at higher exponents.[/QUOTE]
I /think/ I understand what you're saying here. Yes, TF'ing with a GPU adds about three bits to the depth done. And, yes, the GHzD/D drops (slightly) the higher the exponent, but so does the amount of work needed for each bit-level. As in, for example, TF'ing from 75 to 76 bits for a 87M candidate takes about 88 GHzDs, while TF'ing the same depth for a 89M candidate takes about 86 GHzDs. The drop in GHzD/D executed by the different cards just about matches the drop in needed cycles. And, at the same time, the amount of P-1 and LL GHzD saved increases whenever a factor is found for a higher candidate. At the end of the day, it doesn't really make much sense to do anything but TF'ing on a GPU. But people are free to do whatever they want with their kit, time, electrons and money! :smile: |
[QUOTE=LaurV;490494]The exponent is just a string of bits used by the powermod function. There would be no problem to extend it to any value, including to make it string. Powermod should work the same fast (for compatible sizes of the exponent) if it is string or integer.[/QUOTE]
Re extending mfaktx to exponents beyond 2^32, the author of mfaktc addressed this question, around the time of v0.17 mfaktc, in [URL]http://www.mersenneforum.org/showpost.php?p=267892&postcount=1148[/URL] [QUOTE=TheJudger;267892]Not really "hard" but alot of work. The exponent is represented in a single 32bit unsigned integer in mfaktc. Main task in the host code: extend the per class sieve initializations for bigger exponents. This part is not really performance critical as long as the job is "big enough" (runtime per class > than a few seconds). Simple approach: use libgmp. The GPU code "just needs bigger numbers", too. The bad news are that you have to rewrite the complete set of functions (add, sub, mul, div, mod, ...) for bigger numbers.[/QUOTE] |
[QUOTE=Bdot;263174]This is an early announcement that I have ported parts of Olivers (aka TheJudger) mfaktc to OpenCL.
Currently, I have only the Win64 binary, running an adapted version of Olivers 71-bit-mul24 kernel. Not yet optimized, not yet making use of the vectors available in OpenCL. A very simple (and slow) 95-bit kernel is there as well so that the complete selftest finished successfully on my box. On my HD5750 it runs about 60M/s in the 50M exponent range - certainly a lot of headroom :smile: As I have only this one ATI GPU I wanted to see if anyone would be willing to help testing on different hardware. Current requirements: OpenCL 1.1 (i.e. only ATI GPUs), Windows 64-bit. There's still a lot of work until I may eventually release this to the public, but I'm optimistic for the summer. Next steps (unordered): [LIST][*]Linux port (Is Windows 32-bit needed too?)[*]check, if [URL]http://mersenneforum.org/showpost.php?p=258140&postcount=7[/URL] can be used (looks like it's way faster)[*]fast 92/95-bit kernels (barrett)[*]use of vector data types[*]various other performance/optimization tests&enhancements[*]of course, bug fixes:boxer:[*]docs and licensing stuff :yucky:[*]clarify if/how this new kid may contribute to primenet[/LIST]Bdot[/QUOTE] I just filed an issue on mfakto compilation on Debian: [url]https://github.com/Bdot42/mfakto/issues/5[/url] |
[QUOTE=chalsall;490542]I /think/ I understand what you're saying here.
Yes, TF'ing with a GPU adds about three bits to the depth done. And, yes, the GHzD/D drops (slightly) the higher the exponent, but so does the amount of work needed for each bit-level. As in, for example, TF'ing from 75 to 76 bits for a 87M candidate takes about 88 GHzDs, while TF'ing the same depth for a 89M candidate takes about 86 GHzDs. The drop in GHzD/D executed by the different cards just about matches the drop in needed cycles. And, at the same time, the amount of P-1 and LL GHzD saved increases whenever a factor is found for a higher candidate.:smile:[/QUOTE]Trying to clarify (and cleaning up a bit about bit depth limits): Full gputo72 trial factoring depth for 2 primality tests saved goes up about 3 bits per doubling of exponent, up to around 400M exponent. But it goes up at a progressively lower rate, at higher exponents. It declines from 3 bits per exponent octave to 2 and 1.5 bits per octave (doubling of exponent) at higher exponents, judging from info from James Heinrich's charts. As far as the "wavefront" of production assignments, those high exponents lie in the far future (decades away, assuming typical advances in hardware speed). Following figures are for a gtx1070; others are similar 25M 71 bits 50M 74 100M 77 200M 80 400M 83 <- ~3 bits/octave to here 800M 85 <- ~2.5 bits/octave 1G 86 <- mersenne.org exponent assignment limit 1.6G 88 3.2G 90 <-2 bits/octave 4G 90.5 <-mfaktx exponent current limit; mersenne.ca assignment limit 5G 91 (here and larger exponents are extrapolated. It seems likely the multiprecision rewrite of mfaktx would result in lower bits tradeoff at exponents >4G than this projection shows) 8G 92 <-mfakto factor depth limit is 92 bits (estimated 1.5-2 bits depth /octave above 4G) 16G 94 24G 95 <-mfaktc factor depth limit is 95 bits There's plenty of work to do, well below 1G. My current estimate is, the GIMPS wavefront reaches ~500M, about 50 years from now. (At that scale, it doesn't matter much whether the wavefront being considered is TF, P-1, or primality testing. Offsets between TF, P-1, and primality testing wavefronts, of a year or more, ~8M in exponent value, are not significant compared to 50 years and 500M.) TF is best done on gpus, and TF is the most productive use for gpus. And on occasion it can be useful to do otherwise. An old slow small-ram cpu may only be suitable for TF. A P-1 of 544M takes about a year in prime95 on an i3. The same could be done in a week on a gpu, if the CUDAPm1 code could manage to complete such a large exponent. (CUDAPm1 v0.20 can't do both stages on any gpu model I've tried, above ~431M, but that's another story.) |
comments in worktodo file
While looking for something else, I stumbled across this:
The source of parse.c for CUDAPm1 indicates # or \\ or / are comment characters marking the rest of a worktodo line as a comment. And, that it came originally from mfaktc. I've confirmed by test in mfakto that \\ worked, # or / did not work in my test, which placed them at the beginnings of separate records. I could tell by the line number in any warning messages which did or did not work. The capability is not present in the readme.txt (yet) that I recall. |
mfakto.ini parameeters
I started using mfakto witn an INTEL HD4400 GPU. Not as speedy as a "real" Nvidia or AMD, but worth at leasat 5 more GHz-days/day.
as I never used mfakto before, I noticed some tuning parameters on the INI file I never encountered on mfaktc for Nvidia: [code] # Different GPUs may have their best performance with different kernels # Here, you can give a hint to mfakto on how to optimize the kernels. # # Possible values: # GPUType=AUTO try to auto-detect, if that does not work: let me know # GPUType=GCN Tahiti et al. (HD77xx-HD79xx), also assumed for unknown devices. # GPUType=VLIW4 Cayman (HD69xx) # GPUType=VLIW5 most other AMD GPUs (HD4xxx, HD5xxx, HD62xx-HD68xx) # GPUType=APU all APUs (C-30 - C-60, E-240 - E-450, A2-3200 - A8-3870K) not sure if the "small" APUs would work better as VLIW5. # GPUType=CPU all CPUs (when GPU not found, or forced to CPU) # GPUType=NVIDIA reserved for Nvidia-OpenCL. Currently mapped to "CPU" and not yet functional on Nvidia Hardware. # GPUType=INTEL reserved for Intel-OpenCL (e.g. HD4000). Not yet functional. # # Default: GPUType=AUTO GPUType=AUTO [/code] What should I use with my Intel environment, INTEL or AUTO? And how can I get the best from it? Should I tweak [COLOR="Red"]GPUSievePrimes[/COLOR], [COLOR="red"]GPUSieveSize[/COLOR] or [COLOR="red"]GPUSieveProcessSize[/COLOR] to have the best performances on my INTEL GPU? Luigi |
I tried this in the past but it didn't really work, if you use 'auto' it will default to a very slow, partially functional, 'gcn'. You may be able to squeeze a fraction of GHzDays if you tune, it does not worth the effort.
Edit: Whoops, didn't see the other thread. |
| All times are UTC. The time now is 22:42. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.