![]() |
[QUOTE=preda;494010]Yes, right. I'll fix.
Let me explain why: 1-bit reduction in exponent, means 1 bit reduction in TF produces the same number of Ks (factor candidates, FC). *BUT* 1-bit reduction in exponent also means a 2-bit (i.e. four-fold) reduction in PRP time. So 3 bits in total.[/QUOTE] A useful explanation, although I think you've left some nuances out. Fft multiplication (and so also squaring) is typically given as effort proportional to n log n log log n where n is the number of bits, or the Mersenne exponent. And since there are approximately n iterations in the primality test, and reduced length early iterations are not worth bothering with, primality tests have effort proportional to n[SUP]2[/SUP] log n log log n. That deviates a bit from the 2.0 power you imply; I often get around power 2.03 to 2.11 when I chart empirical data (mostly CUDALucas). For some reason, perhaps the cost of extending precision in TF being different than in fft LL or PRP, the number of TF bits per octave, going up the exponent scale, for optimal TF/primality testing tradeoff, on the same gpu, apparently is not constant. (Put another way, the GhzD/day rating of gpu throughput is a function of hardware design, software efficiency, computation type and exponent and other related variables including TF depth, fft length, etc. There's no reason to expect the forms of the rating function for TF and the function for primality to be the same in regard to exponent, or the optimal TF tradeoff to be a simple single-term function of exponent such as bits-to-TF = k log exponent.) It can be seen in James Heinrich's guidance, presumably based on mfaktc/mfakto versus CUDALucas/clLucas performance benchmarking, for example [URL="http://www.mersenne.ca/cudalucas.php?model=684&mmin=100&mmax=1000"]www.mersenne.ca/cudalucas.php?model=684&mmin=100&mmax=1000[/URL] (or higher) or see [URL]http://www.mersenneforum.org/showthread.php?p=492519#post492519[/URL] that the number of TF bits/octave of exponent rolls off above ~500M exponent. However, 3 bits/octave seems to be a pretty good approximation for p<10[SUP]9[/SUP], which will keep us and our gpus busy for a very long time, and has the virtue of simplicity. As primality testing in gpuOwL has become considerably more efficient recently, perhaps approaching triple the speed of clLucas on the same hardware, its TF/PRP optimal tradeoff has some reason to be different than the Heinrich charts. What testing of the tradeoff have you done for gpuOwL v3.7? Does the set of bit levels assume tradeoff of TF against a single primality test, or two? |
[QUOTE=preda;494005]Some basic TF (trial factoring) is now integrated in GpuOwl (OpenCL only initially), in v3.7.
For now the TF part is only known to work with ROCm 1.8.2, as it makes use of some specific extension there. ... By default, the "target TF bit level" is 81 for 332M exponents, 80 for 160M exponents and 79 for 80M exponents. This is what is used with "-tf 0". An offest other than 0 can be specified to change the "target" up or down, e.g.: "-tf -1" if 1-bit-less TF is desired. ... The output/log also changed in 3.7, to uniformly prefix time and cpu-name to every output line.[/QUOTE] Congrats on the rapid advance in gpuOwL capabilities, particularly in the last month or so.:groupwave: Do you have any plans to make TF available in gpuOwL on CUDA? (Saw your note about 128 bit only in ROCm linux OpenCL for now.) Separate posts already discuss a change in the default TF level behavior, and reasoning. Please stabilize the log format.:ouch2: |
Because of the recent [URL="http://mersenneforum.org/showthread.php?t=23462"]very rapid Gerbicz cofactor-compositeness test[/URL], the ideal format for PRP testing would be to output large residues (say 2048 bits) of Type 5, rather than 64-bit residues of Type 1.
Although Primenet is not ready to accept such large residues, perhaps it could at least be an option in the program. |
[QUOTE=Mark Rose;494022]May I also suggest supporting the format:
Factor=N/A,332298607,76,77 It's supported by other tools.[/QUOTE] I added this format too. |
[QUOTE=SELROC;494020]Hugh sorry, I didn't intend to hurt you.
Absolutely. I am happy as is and you should be too :-)[/QUOTE] Nothing to worry, probably my message came through harder than I intended. Everything's fine. |
[QUOTE=GP2;494038]Because of the recent [URL="http://mersenneforum.org/showthread.php?t=23462"]very rapid Gerbicz cofactor-compositeness test[/URL], the ideal format for PRP testing would be to output large residues (say 2048 bits) of Type 5, rather than 64-bit residues of Type 1.
Although Primenet is not ready to accept such large residues, perhaps it could at least be an option in the program.[/QUOTE] The result is JSON now. I wonder, if I add an additional field to the result with this information, would it still be accepted by the parser, maybe ignoring/dropping the new big-residue. |
[QUOTE=GP2;494038]Because of the recent [URL="http://mersenneforum.org/showthread.php?t=23462"]very rapid Gerbicz cofactor-compositeness test[/URL], the ideal format for PRP testing would be to output large residues (say 2048 bits) of Type 5, rather than 64-bit residues of Type 1.
Although Primenet is not ready to accept such large residues, perhaps it could at least be an option in the program.[/QUOTE] What do you mean by "type 5", is it what's here: [url]http://www.mersenneforum.org/showpost.php?p=468378&postcount=209[/url] I understand that what's needed for the 512-bit "fat residue" is: (3^Mp % Mp), correct? but that kind of residue doesn't have a "type" in George's list of types 1-5 above. PS: maybe this is elucidated here [url]http://www.mersenneforum.org/showpost.php?p=468381&postcount=210[/url] , where is said that "type 5" really is (3^Mp). |
[QUOTE=kriesel;494029]What testing of the tradeoff have you done for gpuOwL v3.7? Does the set of bit levels assume tradeoff of TF against a single primality test, or two?[/QUOTE]
My reasoning goes like this: TF a 332M exponent 80-to-81 takes about 15 hours, PRP of the same takes about 32 days, Thus the time ratio PRP/TF is about 50. The probability of finding a factor during that TF is about 1/80. So, paying 1/50 of the time, for a 1/80 chance of skipping the PRP. Not exactly worth it, but not far from it. Thus, when "saving 1 PRP test" (i.e. no PRP DC), the optimal bitlevel is between 80 and 81. OTOH, when "saving two PRP tests" (i.e. with DC), the optimal bitlevel moves to between 81 and 82. So I just set the default for 332M exponent at 81 bits. And navigate up or down from this exponent with a 2.5 bits-per-doubling rate. Like this: targetBitlevel = 81 + 2.5 * (log2(exponent) - log2(332M)) |
[QUOTE=kriesel;494032]
Do you have any plans to make TF available in gpuOwL on CUDA? (Saw your note about 128 bit only in ROCm linux OpenCL for now.) [/QUOTE] Not right now. A couple other things on the wait-list have higher priority, and TF in gpuOwl is not a critical need (as there are good alternatives). [QUOTE]Please stabilize the log format.:ouch2:[/QUOTE] Yes, sorry about that. I did the change now because of the integration of TF with the log. |
[QUOTE=preda;494052]What do you mean by "type 5", is it what's here:
[url]http://www.mersenneforum.org/showpost.php?p=468378&postcount=209[/url] I understand that what's needed for the 512-bit "fat residue" is: (3^Mp % Mp), correct? but that kind of residue doesn't have a "type" in George's list of types 1-5 above. PS: maybe this is elucidated here [url]http://www.mersenneforum.org/showpost.php?p=468381&postcount=210[/url] , where is said that "type 5" really is (3^Mp).[/QUOTE] Sorry, I must be confused. It's PRP type 1 after all. For example, if you create a worktodo.txt for mprime and it contains: [CODE] PRP=1,2,p,-1 [/CODE] For example, if p = 523, then the residue is D749C83A364C1462 (reported as "type 1") which is (a^(2^p - 2) mod (2^p - 1)) mod (2^64) , for a = 3 I think I got the residue type confused because I've been doing stuff with Wagstaff numbers, where you'd have: [CODE] PRP=1,2,p,1,"3" [/CODE] For example, if p = 523, then the residue is 388104444C578BC6 (reported as "type 5") which is (a^(2^p) mod (2^p + 1)) mod (2^64) , for a = 3 And here, because it's a cofactor, the residue type really is "type 5". So the only difference would be substituting a large residue instead of 64 bits. I think 2048 is better than 512, because [URL="http://www.mersenne.ca/manyfactors.php"]there are already a few exponents where the product of known factors approaches or exceeds 512 bits[/URL], so 2048 is more "future-proof". |
[QUOTE=preda;494048]The result is JSON now. I wonder, if I add an additional field to the result with this information, would it still be accepted by the parser, maybe ignoring/dropping the new big-residue.[/QUOTE]
The raw results text gets stored as a varchar(1024) so if we keep it under that length, it would be okay in theory. I haven't seen how James implemented the parsing when a result comes in (I'd assume using some native PHP json parsing), but on the SQL side I'm using the native json features so it will essentially ignore anything I'm not specifically looking for. There would be certain *required* fields, for sure, but as far as including extra info, I guess we'd just want to test it and make sure. |
| All times are UTC. The time now is 23:05. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.